Beefy Boxes and Bandwidth Generously Provided by pair Networks
"be consistent"
 
PerlMonks  

Re: Read table data from PDF

by Ratazong (Monsignor)
on May 11, 2016 at 10:44 UTC ( [id://1162740]=note: print w/replies, xml ) Need Help??


in reply to Read table data from PDF

Hi perlmad,

I am afraid that is a non-trivial task. To know why, please read the following node by almut: Re: CAM::PDF did't extract all pdf's content

I made good experiences by using an external pdf2txt-converter and the parsing the output - but this of course depends on your input-document.

HTH, Rata

Replies are listed 'Best First'.
Re^2: Read table data from PDF
by ateague (Monk) on May 11, 2016 at 13:59 UTC
    I made good experiences by using an external pdf2txt-converter and the parsing the output - but this of course depends on your input-document.

    As a side note, if you go down this route, make absolutely certain that your external program will extract the text with some sort of X/Y position.

    Unless you have full and complete control over the PDF and its generation, parsing PDF text by fixed position row/column is pretty much guaranteed to end in failure, frustration, and an absolutely massive nest of exceptions and special parsing cases

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1162740]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others examining the Monastery: (7)
As of 2024-04-19 08:37 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found