I am not aware of a CPAN-module that offers a kind of
extract_table(page => 42, row => 1, column => 3); method.
Creating that wouldn't be easy since the PDF-operators a more like plotter
commands plotting on a sheet of paper, so there is no markup like
a <TABLE> in HTML which defines some embedded object.
Are your PDF files generated automatically, that is to say in a repeatable fashion?
I once managed to extract table based information from a series
of automatically generated PDF files after converting them into Postscript
using pdftops (not: pdf2ps) and some heuristics.
Quite a game of chance... but maybe it works for you too?
Same approach: CAM::PDF comes with a tool
rewritepdf.pl
which allows to decompress the internal object streams (-d switch).
Analysing the decompressed PDF file might give some hints. A typical table
ENTRY might be embedded like this:
40 0 Td <-- x, y position (Td: goto text position)
(ENTRY)Tj <-- ENTRY (Tj: show text
The Wikipedia entry for
PDF provides a link to
"Portable Document Format: An Introduction for Programmers" which provides a lightweight introduction and a table with common PDF operators.
Update: argl, it's rewritepdf.pl