I did try both of those .. without success.
I got a pdf I've created with openoffice and pdftotext is able to extract text from it, whereas CAM::PDF (or File::Extract::PDF) gives me messy characters.
[jerome@saab pdf]$ getpdftext.pl -v ~/faxTaxHabitation2005.pdf
! " # $
% # & ' (
" ) *
+ + +
...
And pdftotext:
[jerome@saab pdf]$ pdftotext ~/faxTaxHabitation2005.pdf txt
[jerome@saab pdf]$ tail txt
Merci de bien vouloir me confirmer ces informations par retour de fax
+afin que je puisse proceder au paiment le plus rapidement possible au
+ numero suivant : *************
Cordiales salutations.
...
The ideal would be a perl module linked to the xpdf C code .. :)
--
Nice photos of naked perl sources here !
|