Re: Example Of Using CAM::PDF Like HTML::TokeParser

if the pdf layout is the problem, maybe you want consider to use pdftotext playing a little with the layout option,

`pdftotext -layout file.pdf file.txt`;
`pdftotext file.pdf second_file.txt`;
[download]

you can also extract only the desired pages of the pdf instead the whole file, making the search more easy

Comment on Re: Example Of Using CAM::PDF Like HTML::TokeParser Download Code

Replies are listed 'Best First'.
Re^2: Example Of Using CAM::PDF Like HTML::TokeParser by Limbic~Region (Chancellor) on Oct 08, 2011 at 19:31 UTC
pvaldes, As I indicated in my original post, extracting the text didn't work. What I didn't indicate is that I tried every possible tool and variation I could think of to include commercial products. None of the text extractions produce a consistent enough format for me to get at what I need. I understand that what I want to do is not ideal nor easy am may be futile - I however would like to try for myself. Cheers - L~R	[reply]

Replies are listed 'Best First'.

Cheers - L~R