Perl Monk, Perl Meditation | |
PerlMonks |
comment on |
( [id://3333]=superdoc: print w/replies, xml ) | Need Help?? |
Hi!
It's in the nature of PDF that text isn't represented by a sequence of letters, but that each letter may be positioned in the document separately; the order of the letters/words inside the .pdf-file has to be in no relation to the order the text appears on the screen. This makes parsing .pdf-files extremely difficult. I used a program called pdftext.exe (which works quite well extracting whole words (at least in most cases)) and post-processed the result with perl. maybe its worth a try for you also... HTH, Rata In reply to Re: Extract text from PDF (normal text)
by Ratazong
|
|