PDA

View Full Version : Ocr


user
10-10-2007, 01:21 AM
hello

I would like to digitize a book, by taking photos of the book pages and then performing OCR in them

can you tell me please what characteristics must a camera have to do this? big zoom? many megapixels? specific features?

OCR needs a 300dpi scan from a scanner, so can you tell me please which is the equivalent for a digital camera photo in terms of quality? I mean how many megapixels and which distance from the page (macro), how much lighting, zoom etc.

any specific settings of the camera? does the room need to be very lighted? do I need a tripod? and specific add-ons to the camera? any software?
any suggestion would be much appreciated

thanks

TheObiJuan
10-10-2007, 02:05 AM
Budget?

I would recommend a DSLR with a 50mm lens. They are used for this all the time.
This will allow you to minimize distortion, control white balance, and etc.
MP and ZOOM are irrelevant, really.

Color balanced and controlled lighting would be ideal.
Consistency is what you need.

user
10-10-2007, 04:22 AM
thanks for your reply

budget is definately less than 300-500 USD

I was thinking of "FUJIFILM FinePix F50fd" or "Pentax Optio A40"

the predecessor of these have low barrel distortion and good image quality, so I suppose their successor have too

I can have it in my pocket and take it in the library too

but I cant find a review of it

GaryS
10-10-2007, 06:24 AM
So you want a camera to take into a library so you can copy a book which you are not allow to photocopy....

No help from me.... If I am misreading your needs, please explain why you can't just use a scanner.

user
10-10-2007, 07:58 AM
So you want a camera to take into a library so you can copy a book which you are not allow to photocopy....

No help from me.... If I am misreading your needs, please explain why you can't just use a scanner.

no, in libraries we are allowed to photocopy books, my library has bought license for this

as for scanners, have you ever tried to scan a book with a flatbed scanner? the edges are scanned so badly and the book is fataly damaged, not to say it takes ages (many seconds per page)

most industrial strength book scanners (atiz.com, kirtas-tech.com) use cameras to capture, not scanners

SpecialK
10-19-2007, 11:57 PM
I can't imagine doing OCR via the camera. Even a slow scanner would be faster than handling all the images - after getting some type of book holder to hold the pages flat, and a pait of lights, so you could get a recognizable image. It is even possible to OCR off a file? I have an Epson 4990 but have never tried that technique.

Perhaps there is some software that can import jpgs for OCR, but I doubt you'd save time or labor over conventional scanning.

user
10-20-2007, 05:54 AM
I can't imagine doing OCR via the camera.

you may take a look here (http://www.mobileread.com/forums/showpost.php?p=106638&postcount=81)

Even a slow scanner would be faster than handling all the images

are you sure? a 300dpi scan of a A4 page takes at least 15-20 seconds (for low/middle priced scanners), while camera shooting takes miliseconds

(turning the pages of the book is the same time for both ways)

after getting some type of book holder to hold the pages flat, and a pait of lights, so you could get a recognizable image.

do you know any specific 'book holder'?

It is even possible to OCR off a file?

what kind of file? and why not? whats the problem?

Perhaps there is some software that can import jpgs for OCR

whats wrong with importing JPGs for OCR? most OCR programs accept JPGs, I dont understand what are you talking about

maybe you mean a program that will automaticaly transfer the JPGs from the camera to the PC on the fly, as they are created? if yes, it already exists (http://www.breezesys.com/PSRemote/index.htm)

SpecialK
10-20-2007, 12:49 PM
you may take a look
Yes, I know it can be done.


are you sure? a 300dpi scan of a A4 page takes at least 15-20 seconds (for low/middle priced scanners), while camera shooting takes miliseconds

(turning the pages of the book is the same time for both ways)

Yes, but there are more steps involved with the file handling (clicking through a list of filenames, loading the image, etc) that makes the time involved quite a bit longer than "milliseconds."



do you know any specific 'book holder'?

Nope



what kind of file? and why not? whats the problem?

There is no problem, I was asking a question. I see it is quite possible as I just tested it.



whats wrong with importing JPGs for OCR? most OCR programs accept JPGs, I dont understand what are you talking about.

There is nothing wrong. I was just wondering if it was possible - it obviously is.

maybe you mean a program that will automaticaly transfer the JPGs from the camera to the PC on the fly, as they are created? if yes, it already exists (http://www.breezesys.com/PSRemote/index.htm)

That looks nice for $50, though it apparently works only with Canon Powershot cameras.

It seems the information you asked for was already in the links you provided.

I've only ever scanned through a scanner, so I'm just asked some questions. I've found some answers, too. Thanks.

Rhys
10-20-2007, 01:12 PM
I never found ocr software all that great. The best way to do it was to give it a b/w image to work with. bitmap or tiff are best.

SpecialK
10-20-2007, 01:25 PM
I never found ocr software all that great. The best way to do it was to give it a b/w image to work with. bitmap or tiff are best.


There are typos (reados?) to fix with any OCR, but it sure beats typing "everything" in, especially for poor typers like me.