I have actually had a look at those modules, but all they do is create/manipulate pdfs. eg PDF::API2 has a fn $string = $pdf->stringify, but this just dumps the file into a string still as pdf format ie you get a load of binary rubbish.
As for PDF::Extract - "Extracting sub PDF documents from a multi page PDF document"; again output is pdf.
I just need the bare ascii text that pdftotext gives, except it has the odd random glitch which makes the output corrupted in terms of layout.
If I can't predict the layout, I can't parse it.
| [reply] |