http://qs321.pair.com?node_id=623542

karana has asked for the wisdom of the Perl Monks concerning the following question:

i have a pdf file ,say hello.pdf this pdf file conatains following data item price abc 10 def 20 ghi 30 mno 12 xyz 11 i want to search for particular item and get its price . eg : giving input as def ,i want to get output as 20 . i installed PDF::Parser and so many modules from Cpan ...but i dont get i hw to grep for a particular string in a pdf file ... pls help me to do this .

Replies are listed 'Best First'.
Re: search a pdf file
by Samy_rio (Vicar) on Jun 27, 2007 at 08:49 UTC

    Hi karana, I have tried using CAM::PDF module. I think it helps you.

    use strict; use warnings; use CAM::PDF; use CAM::PDF::PageText; my $file = $ARGV[0]; my $search = $ARGV[1]; my $doc = CAM::PDF->new($file) || die "$CAM::PDF::errstr\n"; my $pages = $doc->numPages(); for my $pg (1..$pages) { my $foo = $doc->getPageText($pg); my ($data) = $foo =~ m/$search\s*(\d+)/si; print "In $pg page: $search Value is $data\n"; } __END__ Output is: In 1 page: def Value is 20

    Regards,
    Velusamy R.


    eval"print uc\"\\c$_\""for split'','j)@,/6%@0%2,`e@3!-9v2)/@|6%,53!-9@2~j';

      I must add another vote for CAM::PDF. My problem was parsing orders which arrived in pbd, and this module (with few lines of code just like above example) made that hard task easy.

      I did examine all other PDF modules on CPAN, and concluded that there is some great code if you want to remix PDFs, but for extracting content, CAM::PDF is clear winner for me.


      2share!2flame...