http://qs321.pair.com?node_id=1162740


in reply to Read table data from PDF

Hi perlmad,

I am afraid that is a non-trivial task. To know why, please read the following node by almut: Re: CAM::PDF did't extract all pdf's content

I made good experiences by using an external pdf2txt-converter and the parsing the output - but this of course depends on your input-document.

HTH, Rata

Replies are listed 'Best First'.
Re^2: Read table data from PDF
by ateague (Monk) on May 11, 2016 at 13:59 UTC
    I made good experiences by using an external pdf2txt-converter and the parsing the output - but this of course depends on your input-document.

    As a side note, if you go down this route, make absolutely certain that your external program will extract the text with some sort of X/Y position.

    Unless you have full and complete control over the PDF and its generation, parsing PDF text by fixed position row/column is pretty much guaranteed to end in failure, frustration, and an absolutely massive nest of exceptions and special parsing cases