ConCen

Full Version: Need help for scanning books
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
I have scanned a certain number of books already and the biggest problem i faced was to get the text to be perfectly horizontal ( i don't use OCR because i don't have the time to proof read).

The image authoring software i use (NERO PHOTOSNAP) lets me rotate the image at a custom angle, but it's quite slow and never perfect.

The must be somewhere a way to do this correctly, but i don't know it.
Try this thread. Adobe Fine Reader appears to be the tool of choice but there is always a bit of editing to be done when making OCRs.

Making E-books
http://concen.org/forum/showthread.php?tid=5758
Thanks, but i think i've found what i was looking for: a site dedicated to how to scan books: they even have a forum

http://www.diybookscanner.org/
Here are some more in case of use to anyone as I compiled this list when searching. They include johnston's link to diybookscanner which also links to Scan Tailor, post-scan image cleaner:

http://www.freeocr.net/
Lists some free OCR programs for Windows. Some duplication with stuff mentioned below.

http://www.simpleocr.com/Info.asp
Simple OCR - takes common image files as input and renders editable text, won't take PDF AFAIK.
Tried on sample page tif with fairly big letters (English) and the OCR accuracy was same as FineReader Pro 9 except it made 1 minor misread error which FR did not make. No spell check or other frills.

http://www.diybookscanner.org/forum/index.php
multiple general infos and links

http://scantailor.sourceforge.net/
...post-processing tool for scanned pages. It performs operations such as page splitting, deskewing, adding/removing borders, and others.

http://code.google.com/p/ocropus/
OCR engine - can't see any Windows binaries.

http://www.paperfile.net/
FreeOCR OCR Software V3.0 including the Windows compiled binaries of Tesseract free OCR engine.
The installer wouldn't work with my Windows 2000 setup although it says included for supported o/s and I haven't retried since.

Overall I'm sticking with ABBYY FineReader 9 Pro and don't think version 10 adds anything significant for the money over 9. The program still has some basic unfriendly aspects like not being able to enforce a uniform change - e.g all same font, no bold, no italics - in all recognised text pages of a "FineReader Document" - you have to do it manually page by page (yes). Also when checking recognised text, the separate window which should show the related portion of the scanned image from which that text was derived does in fact not always do so and you have to exit the recognised text window and scrollbar in the other to locate it. And it (or was it FR ver. 6?) uses "batch" in the sense of "document job" unlike in the venerable Textbridge which used it in its more accepted DOS/Windows sense, an autoprocess batch capability of chewing through a directory of images and spitting out OCRd textfiles with the same names but .txt extensions which as far as I can see ABBYY FR still lacks. And some others.
Alternatives: you can't download a trial version of OmniPage and Textbridge seems in less active (no?) development although I was getting good results saving in plaintext unformatted 10+ years ago.
There are other commerical packages but they don't seem much better if any than the freeware stuff coming along. And even freeware has many more, some going back years and since discontinued and generally not great.
Well, I used to have the same the same problem when trying to make the text horizontal. Actually, there is not much to it. All you need to do is to go get an Adobe Fine reader and do some research on how to make OCR with the Adobe Fine reader
(12-28-2010 07:15 AM)kann Wrote: [ -> ]Well, I used to have the same the same problem when trying to make the text horizontal. Actually, there is not much to it. All you need to do is to go get an Adobe Fine reader and do some research on how to make OCR with the Adobe Fine reader

...it's "ABBYY Finereader" - I don't think Adobe have anything to do with it unless they've just gone on a corporate acquisition binge.

Trying to get the text horizontal - if you mean the .tiff or whatever else image scan of the relevant paper page is askew, try Scan Tailor from the list above. I found it was able to adjust a page that was seriously lopsided which the auto-fix in Finereader couldn't manage. If you're using FR you just save the relevant page in FR as a .tiff (unpacked or say Grp3/4 b&w or grey if the rest are that way) and open it in Scan Tailor, make the adjustments and output in ST and then reimport the adjusted page into the FR Document. Worth it if the image page looks dumb as is.
Reference URL's