Beefy Boxes and Bandwidth Generously Provided by pair Networks
Keep It Simple, Stupid
 
PerlMonks  

Re: extract text from pdf

by mk. (Friar)
on Nov 08, 2006 at 13:10 UTC ( #582872=note: print w/replies, xml ) Need Help??


in reply to extract text from pdf

have you tried File::Extract::PDF?!
it uses CAM::PDF internally, but maybe you have better luck with it.

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
*women.pm

Replies are listed 'Best First'.
Re^2: extract text from pdf
by jeteve (Pilgrim) on Nov 08, 2006 at 13:32 UTC
    I did try both of those .. without success.

    I got a pdf I've created with openoffice and pdftotext is able to extract text from it, whereas CAM::PDF (or File::Extract::PDF) gives me messy characters.

    [jerome@saab pdf]$ getpdftext.pl -v ~/faxTaxHabitation2005.pdf                                                  ! " #  $  % # & ' ( "  ) * + + + ...
    And pdftotext:
    [jerome@saab pdf]$ pdftotext ~/faxTaxHabitation2005.pdf txt [jerome@saab pdf]$ tail txt Merci de bien vouloir me confirmer ces informations par retour de fax +afin que je puisse proceder au paiment le plus rapidement possible au + numero suivant : ************* Cordiales salutations. ...

    The ideal would be a perl module linked to the xpdf C code .. :)

    -- Nice photos of naked perl sources here !

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://582872]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others examining the Monastery: (5)
As of 2022-01-28 13:11 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    In 2022, my preferred method to securely store passwords is:












    Results (73 votes). Check out past polls.

    Notices?