Beefy Boxes and Bandwidth Generously Provided by pair Networks
XP is just a number
 
PerlMonks  

Re^2: extract text from pdf

by jeteve (Pilgrim)
on Nov 08, 2006 at 13:32 UTC ( [id://582876]=note: print w/replies, xml ) Need Help??


in reply to Re: extract text from pdf
in thread extract text from pdf

I did try both of those .. without success.

I got a pdf I've created with openoffice and pdftotext is able to extract text from it, whereas CAM::PDF (or File::Extract::PDF) gives me messy characters.

[jerome@saab pdf]$ getpdftext.pl -v ~/faxTaxHabitation2005.pdf                                                  ! " #  $  % # & ' ( "  ) * + + + ...
And pdftotext:
[jerome@saab pdf]$ pdftotext ~/faxTaxHabitation2005.pdf txt [jerome@saab pdf]$ tail txt Merci de bien vouloir me confirmer ces informations par retour de fax +afin que je puisse proceder au paiment le plus rapidement possible au + numero suivant : ************* Cordiales salutations. ...

The ideal would be a perl module linked to the xpdf C code .. :)

-- Nice photos of naked perl sources here !

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://582876]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others meditating upon the Monastery: (7)
As of 2024-04-25 08:00 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found