With the help of Alan Fry I could manage to get a fast solution like this
sub gettitle {
use Fcntl;
my $file = shift;
local *IN;
sysopen( IN, $file, O_RDONLY, 0 )
or die "while reading: '$file'\n";
read IN, my ($str), -s $file;
close IN;
my ($info_block) = ( $str =~ /\/Info\s(\d+)\s0\sR/ )
or die "cannot get /Info paragraph\n";
my $searchpos = -1;
my $info_start;
while (1) {
$info_start =
index( $str, "$info_block 0 obj",
$searchpos + 1 );
die "cannot get position of '$info_block 0 obj'\n"
if $info_start < $searchpos + 1;
last
if (
substr( $str, $info_start - 1, 1 ) =~
/\015|\012/ );
$searchpos = $info_start;
}
my $info_obj = substr( $str, $info_start,
index( $str, ">>", $info_start ) - $info_start +
2 );
my ($title) =
( $info_obj =~
/\/Title\s*\( ([^\015\012|\015|\012]*) \) /x )
or return 'undefined';
return $title;
}
I furthermore compared the performance of the above solution with
Text::PDF and PDF-111 from CPAN. The test set consisted of 36 PDF files
summing up to 3.8 MB.
runtime ratios of
index-solution-from-above : Text::PDF methods : PDF-111
were:
1 : 6 : 12
PDF-111 from CPAN has other flaws too. The author didn't respond to my questions.
IMHO it should be dumped. It has a far to promiment place in the module hierarchy. |