View Full Version : Ocr
hello
I would like to digitize a book, by taking photos of the book pages and then performing OCR in them
can you tell me please what characteristics must a camera have to do this? big zoom? many megapixels? specific features?
OCR needs a 300dpi scan from a scanner, so can you tell me please which is the equivalent for a digital camera photo in terms of quality? I mean how many megapixels and which distance from the page (macro), how much lighting, zoom etc.
any specific settings of the camera? does the room need to be very lighted? do I need a tripod? and specific add-ons to the camera? any software?
any suggestion would be much appreciated
thanks
TheObiJuan
10-10-2007, 02:05 AM
Budget?
I would recommend a DSLR with a 50mm lens. They are used for this all the time.
This will allow you to minimize distortion, control white balance, and etc.
MP and ZOOM are irrelevant, really.
Color balanced and controlled lighting would be ideal.
Consistency is what you need.
thanks for your reply
budget is definately less than 300-500 USD
I was thinking of "FUJIFILM FinePix F50fd" or "Pentax Optio A40"
the predecessor of these have low barrel distortion and good image quality, so I suppose their successor have too
I can have it in my pocket and take it in the library too
but I cant find a review of it
GaryS
10-10-2007, 06:24 AM
So you want a camera to take into a library so you can copy a book which you are not allow to photocopy....
No help from me.... If I am misreading your needs, please explain why you can't just use a scanner.
So you want a camera to take into a library so you can copy a book which you are not allow to photocopy....
No help from me.... If I am misreading your needs, please explain why you can't just use a scanner.
no, in libraries we are allowed to photocopy books, my library has bought license for this
as for scanners, have you ever tried to scan a book with a flatbed scanner? the edges are scanned so badly and the book is fataly damaged, not to say it takes ages (many seconds per page)
most industrial strength book scanners (atiz.com, kirtas-tech.com) use cameras to capture, not scanners
SpecialK
10-19-2007, 11:57 PM
I can't imagine doing OCR via the camera. Even a slow scanner would be faster than handling all the images - after getting some type of book holder to hold the pages flat, and a pait of lights, so you could get a recognizable image. It is even possible to OCR off a file? I have an Epson 4990 but have never tried that technique.
Perhaps there is some software that can import jpgs for OCR, but I doubt you'd save time or labor over conventional scanning.
I can't imagine doing OCR via the camera.
you may take a look here (http://www.mobileread.com/forums/showpost.php?p=106638&postcount=81)
Even a slow scanner would be faster than handling all the images
are you sure? a 300dpi scan of a A4 page takes at least 15-20 seconds (for low/middle priced scanners), while camera shooting takes miliseconds
(turning the pages of the book is the same time for both ways)
after getting some type of book holder to hold the pages flat, and a pait of lights, so you could get a recognizable image.
do you know any specific 'book holder'?
It is even possible to OCR off a file?
what kind of file? and why not? whats the problem?
Perhaps there is some software that can import jpgs for OCR
whats wrong with importing JPGs for OCR? most OCR programs accept JPGs, I dont understand what are you talking about
maybe you mean a program that will automaticaly transfer the JPGs from the camera to the PC on the fly, as they are created? if yes, it already exists (http://www.breezesys.com/PSRemote/index.htm)
SpecialK
10-20-2007, 12:49 PM
you may take a look
Yes, I know it can be done.
are you sure? a 300dpi scan of a A4 page takes at least 15-20 seconds (for low/middle priced scanners), while camera shooting takes miliseconds
(turning the pages of the book is the same time for both ways)
Yes, but there are more steps involved with the file handling (clicking through a list of filenames, loading the image, etc) that makes the time involved quite a bit longer than "milliseconds."
do you know any specific 'book holder'?
Nope
what kind of file? and why not? whats the problem?
There is no problem, I was asking a question. I see it is quite possible as I just tested it.
whats wrong with importing JPGs for OCR? most OCR programs accept JPGs, I dont understand what are you talking about.
There is nothing wrong. I was just wondering if it was possible - it obviously is.
maybe you mean a program that will automaticaly transfer the JPGs from the camera to the PC on the fly, as they are created? if yes, it already exists (http://www.breezesys.com/PSRemote/index.htm)
That looks nice for $50, though it apparently works only with Canon Powershot cameras.
It seems the information you asked for was already in the links you provided.
I've only ever scanned through a scanner, so I'm just asked some questions. I've found some answers, too. Thanks.
I never found ocr software all that great. The best way to do it was to give it a b/w image to work with. bitmap or tiff are best.
SpecialK
10-20-2007, 01:25 PM
I never found ocr software all that great. The best way to do it was to give it a b/w image to work with. bitmap or tiff are best.
There are typos (reados?) to fix with any OCR, but it sure beats typing "everything" in, especially for poor typers like me.
vBulletin® v3.8.4, Copyright ©2000-2010, Jelsoft Enterprises Ltd.