Beefy Boxes and Bandwidth Generously Provided by pair Networks
Think about Loose Coupling
 
PerlMonks  

What Perl module will help me extract the text from a PDF file?

by Anonymous Monk
on Sep 28, 2004 at 17:55 UTC ( [id://394699]=perlquestion: print w/replies, xml ) Need Help??

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

So many Perl PDF modules to choose from. Which is the right one for extracting text from a PDF file? Thanks!
  • Comment on What Perl module will help me extract the text from a PDF file?

Replies are listed 'Best First'.
Re: What Perl module will help me extract the text from a PDF file?
by dragonchild (Archbishop) on Sep 28, 2004 at 18:24 UTC
    PDF::Parse is the one most recommended. PDF::Reuse is also good, but for a different purpose. If you are willing to wade through a little code, PDF::API2 is the swiss-army knife for Perl/PDF work. It does require some knowledge of the PDF format, though.

    Being right, does not endow the right to be rude; politeness costs nothing.
    Being unknowing, is not the same as being stupid.
    Expressing a contrary opinion, whether to the individual or the group, is more often a sign of deeper thought than of cantankerous belligerence.
    Do not mistake your goals as the only goals; your opinion as the only opinion; your confidence as correctness. Saying you know better is not the same as explaining you know better.

    I shouldn't have to say this, but any code, unless otherwise stated, is untested

Re: What Perl module will help me extract the text from a PDF file?
by buckaduck (Chaplain) on Sep 28, 2004 at 18:24 UTC
    Try some of the solutions in this node.

    buckaduck

Re: What Perl module will help me extract the text from a PDF file?
by mifflin (Curate) on Sep 28, 2004 at 18:04 UTC
      Yeah, the monk in that thread asked the same question and got no answer.
Re: What Perl module will help me extract the text from a PDF file?
by CountZero (Bishop) on Sep 28, 2004 at 19:43 UTC
    Some PDF-files are just pictures, so you cannot extract any text from it, unless you consider using OCR-techniques. I guess it all depends on how the PDF-file was generated.

    CountZero

    "If you have four groups working on a compiler, you'll get a 4-pass compiler." - Conway's Law

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://394699]
Approved by Arunbear
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others musing on the Monastery: (6)
As of 2024-04-24 09:31 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found