Support
ChronoScan - Document Capture Software - Lazy OCR? Why? - ChronoScan Forum
× Welcome to the ChronoScan forum!

Tell us and our members who you are, what you like and why you became a member of this site.
We welcome all new members and hope to see you around a lot!

Lazy OCR? Why?

More
14 Aug 2020 22:30 #3277 by gyoung
Lazy OCR? Why? was created by gyoung
Hi Gabriel,

We are one of your dealers, and do a lot of ChronoScan configuration for our customers including very complex scripting. One thing about ChronoScan which is very disappointing and wastes a lot of time, as well as making ChronoScan inconsistent and unreliable is the OCR performance.

It seems that in any batch, the result of the OCR function (either as a process step or run independently) is very poor, and it seems to be random. Out of 500 images, 450 will be fine and 50 will be useless. And it is not image quality that is the issue per se. All 500 images can be considered equal in quality.

The kicker is... if I run the "Rebuild OCR" on these 50 failed images... they come back perfect! WHY???

Why can't ChronoScan perform the "good" OCR in the first place? I have scoured the program and the documentation for an answer and have found nothing. What makes "Rebuild OCR" better than the first attempt, and how can I force the OCR to run at the high level as part of the automatic process? Most customers have opted for the Nuance module to gain better OCR, and are disappointed with these results. Let me know if there is a solution.

Thanks,

Gerald

Please Log in or Create an account to join the conversation.

More
14 Aug 2020 23:29 #3278 by Gsimao
Replied by Gsimao on topic Lazy OCR? Why?
Gerald,
usually, that happens when you have scanned documents that are generated by other software that performs their own OCR and sets that as the text layer to the resulting pdf file. Most of the time, the OCR from ChronoScan will be far superior to those performed by such applications, even without the Nuance plugin. One option is for you to set "Extract Text from PDF File" on your input operations to false. This will allow ChronoScan to perform it's own OCR.
ChronoScan can't perform the good OCR most likely because your pdf documents are already coming with their own OCR layer. When you use the "Rebuild OCR" function, you are forcing ChronoScan to perform OCR on the document, thus replacing the PDF text layer. If you prevent ChronoScan from importing that PDF text layer in the first place, by setting "Extract Text from PDF File" to false on your input settings, you will get the good OCR right away.
Let me know if that helps.

Please Log in or Create an account to join the conversation.

More
17 Aug 2020 16:45 #3279 by gyoung
Replied by gyoung on topic Lazy OCR? Why?
Hi Gabriel,

I agree that when ChronoScan does perform the OCR properly, it is superior to many other applications. My frustration is that it appears that ChronoSCan doesn't always do what is asked of it.

We always set "Extract Text from PDF File" on our input operations to false. See attached image. I've also included the process settings for you to see.

Many times, there is no OCR perfomed by ChronoScan at all! See attached example. So this behaviour of ChronoScan "deciding" to not perform OCR on documents is where we have no answers, no logic, and an unreliable tool.

There are many other issues... possibly for another thread, but I'll mention one here that you may have a quick answer for: When a document is processed and still appears rotated left or right, or upside down, ChronoScan will not repond to manually rotating the document. See attached.

Here is one thought... is ChronoScan limited to small batches in order to work reliably? Perhaps the issues didn't seem as apparent in small batches where maybe one document failed and we attributed it to a poor image. But now with large batches for this customer, close to 1000 images in an import of 100 documents, the failed processes point to an issue with ChronoScan and not poor image quality.

I know there's a lot of information here, but we have been working with ChronoScan for three years and include it in almost every document solution we sell. I need to understand why ChronoScan is doing what it does. If the solution is to bypass all of the GUI processing and script the entire process from beginning to end, so be it. Any help is appreciated.

Thanks,

Gerald

Please Log in or Create an account to join the conversation.

Time to create page: 0.269 seconds