Parsing PDFs by text position?

LanX has asked for the wisdom of the Perl Monks concerning the following question:

I'm trying to parse PDFs of account balances.

ATM I'm piping them through pdftotext -layout to get a text representation respecting the positions...since the fields are in different columns.

Unfortunately this becomes more hairy than I thought and now I'm wondering if I'm reinventing a CPAN wheel I can't find ...

So are there moduls to parse PDFs (or texts) by clipping-positions?

And for texts is there anything to reverse the effect of format?

Cheers Rolf

Actually I have two problems:

a) to get the precise word positions,

since pdftohtml -xml doesn't break up at all whitespaces:

<text top="239" left="33" width="491" height="7" font="2">28.12. 28.12. 0036 Kartenverfüg 39,75 -</text>

b) defining 2 dimensional scan templates (reversing format)

I already got pretty far, but I was wondering if there is a recommended way to do it...

other threads about pdf parsing are:

BTW: It's not an OCR issue, I can get all characters ...