Beefy Boxes and Bandwidth Generously Provided by pair Networks
Syntactic Confectionery Delight
 
PerlMonks  

Re: PDF GetInfo(

by Arguile (Hermit)
on Aug 16, 2002 at 18:01 UTC ( [id://190717] : note . print w/replies, xml ) Need Help??


in reply to PDF GetInfo(

Rather late on the reply, but I was just doing something similar. I needed to generate an alphabetical index of publications (all PDF) and wanted it in a nice HTML table with other pertinant metadata (pages, creation date, etc.).

What follows is just a bit of cleaned up PDF reading code:

opendir DIR, '.'; my @pdf = sort { $a->{Title} cmp $b->{Title} } map { scalar( get_info($_) ) } grep { /\.pdf/i && -f $_ } readdir DIR; sub get_info { # Get basic PDF metadata. my $file = shift; my %info; my $pdf = PDF->new($file); return undef unless $pdf->IsaPDF; $info{Filename} = $file; $info{Size} = -s $file; $info{Pages} = $pdf->Pages; for (qw(Title CreationDate ModDate)) { $info{$_} = $pdf->GetInfo($_) } return( wantarray ? %info : \%info ); }

That opens the current directory and results in an alphabetically sorted (if you wanted it sorted by different criteria just change the sort {} section) array of PDF metadata info hashes. $pdf->[0]{Title} is the title of the first PDF in the array.

print "$_->{Title}\n" for @pdf;

Will give you a plain text list of all the document titles. If you only want titles just delete all the hash stuff in the sub and return the title scalar only.