Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Making E-books
12-29-2008, 08:40 PM, (This post was last modified: 12-29-2008, 08:43 PM by Ratiocinator.)
#1
Making E-books
I am trying to make E-books by scanning hard copy books. I have Abbyy FineReader 9.0 OCR software and am using a typical flatbed scanner.

The problem I have is that of page cropping/size. Please see the example I have just uploaded and linked to, below, produced using FineReader, to see what I mean (I tried to attach it to this thread, but the forum software would not let me. Has this function been disabled?). The PDF page is much larger than the area of text/book page. I have no idea how to crop this PDF page to make it the ideal size (I would have thought this would be done automatically).

I have Googled for an answer, and looked through the user’s guide, but found no help. FineReader only has an image crop function, that I can see.

I will also, at the end of all the scanning, need to connect all of the PDFs to form the finished E-book. I do not know how to do that, also.

Example pages (two pages split automatically. At least I found out how to work the split page function):

http://download166.mediafire.com/zxnawdwpg...rydqhqw/150.pdf

Edit: OK, so the attachment function does work, after all!
Funny how people insist that death, murder, torture has to be a part of ones dietary needs.....fucking nutters everywhere!!

--- Purple Bex.
Reply
12-30-2008, 12:40 AM,
#2
Making E-books
two things: your ocr results look really good, consider leaving it as text or as formatted text. secondly it appears that you are scanning the whole surface of the scanner each time there should be a setting to allow you to autoselect just the text area. you may have to do it by hand.
Reply
12-30-2008, 02:48 AM,
#3
Making E-books
Quote:I tried to attach it to this thread, but the forum software would not let me. Has this function been disabled?

No, but it depends on the file extension.
Reply
12-30-2008, 02:58 AM,
#4
Making E-books
what you need is software that reads the information as text not as an image, then the file document will be either word (editable) or pdf. then you can save it as pdf if word.

ya
Reply
12-30-2008, 03:22 AM,
#5
Making E-books
Those pdfs appear to have embedded text and not the scan images. At least when I opened on in acrobat viewer for linux I was able to copy/paste straight into a text editor which would not have been possible, I do not think, if they were images.
Reply
12-30-2008, 04:20 AM,
#6
Making E-books
Quote:two things: your ocr results look really good, consider leaving it as text or as formatted text. secondly it appears that you are scanning the whole surface of the scanner each time there should be a setting to allow you to autoselect just the text area. you may have to do it by hand.

Thanks. I took another look at the scan dialogue box; there is no option for text only, just for different sizes of scanning area.

I was able to improve the matter by dragging a box over the book for the area to be scanned only. But I am not able to crop what I have scanned after the scanning. When I re-scanned this way, two pages together, after the auto splitting each page was automatically set to be a different size, which is far from ideal.

So my search for an answer continues!

Quote:what you need is software that reads the information as text not as an image, then the file document will be either word (editable) or pdf. then you can save it as pdf if word.

ya

That's what I am using, only it is not flexible enough, or doesn't seem to be anyway.

Quote:Those pdfs appear to have embedded text and not the scan images. At least when I opened on in acrobat viewer for linux I was able to copy/paste straight into a text editor which would not have been possible, I do not think, if they were images.

The text was saved to permit such a thing. It is not saved as an image. Perhaps it could be, then I may be able to crop? It is not clear, though.

Quote:
Quote:I tried to attach it to this thread, but the forum software would not let me. Has this function been disabled?

No, but it depends on the file extension.

The attachments section of the text input page indicated that no attachments had been saved/accepted. The preview page did not show them, either. They only appeared after I submitted my post/thread (the PDF).
Funny how people insist that death, murder, torture has to be a part of ones dietary needs.....fucking nutters everywhere!!

--- Purple Bex.
Reply
12-30-2008, 04:57 AM,
#7
Making E-books
What did you use to create the pdf's then if you saved the ocr output as text?

Another thing you could do is to save the scans directly as tiffs or other lossless graphic file format then crop them individually in an image editing application and then to run the edited pictures back through the fine reader.

It does appear to be robust though
Quote:Easy Image Enhancements
All image pre-processing functions, such as Rotate, Deskew, Crop, and Invert Image, can be quickly accessed from a single “Edit Image” window. Users can easily apply changes made to a single image page to all pages of a document with one click
If you can figure out how that works it sounds like it will save you a lot of trouble. I cannot be too specific as I do not use windows, I had a copy of finereader 5 a few years ago but time change.

One thing I seem to recall is that you could select two separate areas and it would keep them separate so select one page as one area and the facing page as another. I might be wrong though.
Reply
07-10-2010, 05:50 AM,
#8
RE: Making E-books
Some good pointers and an overview here:

HOW TO SCAN A BOOK
http://www.proportionalreading.com/scan.html

.. might be a bit outdated (1996) but the basic methodology still holds true.

Ouch I just tried Simple OCR (Free) and it was rife with errors Image2PDF DLL crashed when I used it (Vista32). Still looking for some decent OCR freeware for this box. Suggestions?
There are no others, there is only us.
http://FastTadpole.com/
Reply
07-10-2010, 11:17 AM,
#9
RE: Making E-books
Ratiocinator

#1.
Quote:The PDF page is much larger than the area of text/book page. I have no idea how to crop this PDF page to make it the ideal size (I would have thought this would be done automatically).

#2.
Quote:When I re-scanned this way, two pages together, after the auto splitting each page was automatically set to be a different size, which is far from ideal.

I haven't looked at your pdf so the following may be out of order but you seem to be saying two different things:-

a) the page image is bigger than the original page so it includes blank background from the scanner platen;

b) when you crop pages in FR9 (manually or by the auto process you mentioned) the pages are not uniformly sized so if you were to review the images successively at speed it would be like a zoetrope with the page jumping around, reflecting the fact that each page had not been sized to the same h x w dimensions.

My personal opinion is that first, it's better when saving as PDF to save the page image and under it, the OCRd editable text - at least when a copy of the original page is required for (some) assurance of authenticity, unless the original physical page print quality is so bad that it is hard to read; or otherwise, that the resulting PDF is too many MB for easy transfer.

I've never scanned in FR9 (I assume it employs some system common TWAIN driver(s)) but have OCRd quite a bit.

Your questions:

#1. -- as has been pointed out, this is generally sorted at the scanning/capture/acquire image stage - the scanner software/TWAIN driver should allow you in preview mode to size the part of the scanner platen that is being scanned (i.e. the part covered by your book page), nowadays by dragging height and width indicator lines with the mouse rather than inputting dimensions as numerals. That preview process will usually also allow you to scale the acquired image, normally the page is scanned at 100% but even scan+ OCR software I used 10+ years ago allowed image acquisition to be set at e.g. 50% so the acquired page image was half the original physical page scale.
However once the image has been acquired, FR9 like many other OCR programs allows you to edit the image acquired e.g.to remove black shaded borders, remove marks and so on, before the OCR process isapplied to acquire editable text copy. As has been remarked, in FR9 you access that utility by going to Page / Edit Page Image and select e.g. Crop, Eraser etc as you have found out.

#2. -- This to me is the more interesting question - in FR9 specifically, how do you get your pages of a uniform size, h x w? I don't know the answer to that and would be grateful if someone who *does* know could post - I haven't yet checked the links so if there, if anyone has read them now, maybe they can point out the specific url that deals with this? It may be hidden somewhere in the other info already discussed eg in FR9's Page/Edit Page menu the Crop function allows you to apply the cutting to all pages uniformly but typically that won't work since the physical page won't be in exactly the same spot on the scanner platen when the image was acquired so depending on the size of the blank margins round the page text etc if you use this tool you may get some pages with bits of the edge removed that shouldn't be...
The other related idea is maybe investigate in FR9 mainscreen window 2: Image, click on the page image in that window and then select underneath it Image Properties which will include a statment of the Height x Width and also the Resolution. Maybe playing with the Resolution will help (this also changes h x w?), although if reduced to under 300dpi it will potentially make the editable text less accurate necessitating more spell-checking and editing. (Incidentally I find FR9's spell check never gets all the errors and after spell-checking I just save as e.g. Word then spell check in Word and when I find/correct an error in Word I make manually the same change in the FR9 editable text. NB this post has not been spell-checked Smile ).
Reply


Possibly Related Threads...
Thread Author Replies Views Last Post
  Is Alex Jones making documentaries anymore? capnchronic 0 663 04-12-2011, 12:59 PM
Last Post: capnchronic
  Chicagoland area - Lot of 50+ books for sale - Adventures Unlimited Press MageWolf 0 528 08-29-2010, 11:12 PM
Last Post: MageWolf
  A million library books to be sent down the mines Apocalypso Now! 0 468 02-27-2010, 12:57 AM
Last Post: Apocalypso Now!
  Making the world a better place jack 62 6,752 03-13-2009, 07:14 PM
Last Post: ---
  Vatican strongly criticizes Israeli TV show for making `blasphemous' jokes about Jesus, Mary --- 2 1,002 02-22-2009, 03:02 PM
Last Post: rsol
  Federal Reserve Comic Books black_action_hero 2 711 12-19-2008, 02:42 PM
Last Post: black_action_hero
  Conspiracies in comic books stanteau 14 1,498 10-28-2008, 11:43 PM
Last Post: shZ
  Read these books or...dont waxzy 0 498 10-18-2008, 04:48 PM
Last Post: waxzy
  Making Friends dblack 5 857 07-20-2008, 04:38 PM
Last Post: fjaneson
  Top 10 Worst Books trueaim 7 1,197 05-28-2008, 04:30 AM
Last Post: Melchor

Forum Jump:


Users browsing this thread: 1 Guest(s)