What Perl module will help me extract the text from a PDF file?

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: What Perl module will help me extract the text from a PDF file? by dragonchild (Archbishop) on Sep 28, 2004 at 18:24 UTC
PDF::Parse is the one most recommended. PDF::Reuse is also good, but for a different purpose. If you are willing to wade through a little code, PDF::API2 is the swiss-army knife for Perl/PDF work. It does require some knowledge of the PDF format, though. Being right, does not endow the right to be rude; politeness costs nothing. Being unknowing, is not the same as being stupid. Expressing a contrary opinion, whether to the individual or the group, is more often a sign of deeper thought than of cantankerous belligerence. Do not mistake your goals as the only goals; your opinion as the only opinion; your confidence as correctness. Saying you know better is not the same as explaining you know better. I shouldn't have to say this, but any code, unless otherwise stated, is untested	[reply]
Re: What Perl module will help me extract the text from a PDF file? by buckaduck (Chaplain) on Sep 28, 2004 at 18:24 UTC
Try some of the solutions in this node. buckaduck	[reply]
Re: What Perl module will help me extract the text from a PDF file? by mifflin (Curate) on Sep 28, 2004 at 18:04 UTC
Reading PDF Files?	[reply]
Re^2: What Perl module will help me extract the text from a PDF file? by Anonymous Monk on Sep 28, 2004 at 18:15 UTC
Yeah, the monk in that thread asked the same question and got no answer.	[reply]
Re^3: What Perl module will help me extract the text from a PDF file? by mifflin (Curate) on Sep 28, 2004 at 18:20 UTC
Then try this node pdf2doc2rtf2html2txt	[reply]
Re: What Perl module will help me extract the text from a PDF file? by CountZero (Bishop) on Sep 28, 2004 at 19:43 UTC
Some PDF-files are just pictures, so you cannot extract any text from it, unless you consider using OCR-techniques. I guess it all depends on how the PDF-file was generated. CountZero "If you have four groups working on a compiler, you'll get a 4-pass compiler." - Conway's Law	[reply]


Think about Loose Coupling
	PerlMonks