XP is just a number | |
PerlMonks |
Re: parse content of PDF fileby marto (Cardinal) |
on Aug 03, 2007 at 13:55 UTC ( [id://630513]=note: print w/replies, xml ) | Need Help?? |
Had they been converted to PDF via Acrobat (or such like) rather than scanned Images I would have suggested looking at CAM::PDF, however I think you are going to have to OCR each page of each document, since IIRC there won't be any (meaningful) text to parse within the PDF. You may want to start by looking at PDF::OCR (which IIRC uses Tesseract) , or some other OCR module from CPAN.
Check out the code.google page for tesseract-ocr Update: Added link to tesseract-ocr Hope this helps Martin
In Section
Seekers of Perl Wisdom
|
|