Beefy Boxes and Bandwidth Generously Provided by pair Networks
Come for the quick hacks, stay for the epiphanies.
 
PerlMonks  

PDF::API2 - issues with "delete()" and "title()" methods in Outline.pm

by ateague (Monk)
on Sep 02, 2022 at 22:54 UTC ( [id://11146650]=perlquestion: print w/replies, xml ) Need Help??

ateague has asked for the wisdom of the Perl Monks concerning the following question:

Good morning!

I am using PDF::API2 version 2.043 on Strawberry Perl 5.032 and am running into some problems figuring out how to use the title() and delete() methods on bookmark outlines in PDFs with existing bookmark outlines.

I have a collection of PDFs that already have outlines in them and I was hoping to rename/re-title the outline on page 1 and delete all other outlines in the PDF. I have not been able to do so and no matter what I have tried, the outlines do not appear to be modified when I open the PDF in a viewer after running my script.

Can this module manipulate/delete existing outlines in a PDF? I even went so far as to "Use the Source AlexiLuke" and took a quick shufti at the module tests to look for an example syntax, and it appears that the tests do not verify if the outlines are actually deleted or edited in the actual PDF. Looking at the actual PDF contents shows that the module is simply slapping some extra objects and xref content after the end of the file.

Here is a small example test case that illustrates the problems I am having:

#!/usr/bin/perl use 5.032; use warnings; use strict; use PDF::API2; use Test::More tests => 2; my ($stringy_bare_pdf, $stringy_outline_pdf); BARE_PDF: { my $pdf = PDF::API2->new(-compress => 0); my $page1 = $pdf->page(); $stringy_bare_pdf = $pdf->to_string(); } OUTLINE: { my $pdf = PDF::API2->new(-compress => 0); my $page1 = $pdf->page(); my $outlines = $pdf->outlines(); my $outline = $outlines->outline(); $outline->title('Test Outline 1'); $outline->dest(1); $stringy_outline_pdf = $pdf->to_string(); } DELETE_OUTLINE: { my $pdf = PDF::API2->from_string($stringy_outline_pdf, -compress = +> 0) or die $!; my $root = $pdf->outlines(); my $outline = $root->outline(); $outline->delete(); is($pdf->to_string(), $stringy_bare_pdf, 'Make sure the outline ac +tually got deleted from the PDF'); } MODIFY_OUTLINE: { my $pdf = PDF::API2->from_string($stringy_outline_pdf, -compress = +> 0) or die $!; my $root = $pdf->outlines(); my $outline = $root->outline(); $outline->title('Test Outline 2'); $outline->dest(1); is($pdf->to_string(), $stringy_outline_pdf, 'Make sure the outline + text actually got changed in the PDF'); } done_testing();

And here is the output of the tests showing the expected and actual PDF content:

perl pdf_test.pl 1..2 not ok 1 - Make sure the outline actually got deleted from the PDF # Failed test 'Make sure the outline actually got deleted from the P +DF' # at pdf_test.pl line 36. # got: '%PDF-1.4 # %&#9566;&#9552;&#9552;&#9569; # 1 0 obj << /Type /Catalog /Outlines 6 0 R /PageMode /SinglePage /Pag +es 2 0 R /ViewerPreferences << /NonFullScreenPageMode /UseNone >> >> +endobj # 2 0 obj << /Type /Pages /Count 1 /Kids [ 5 0 R ] /Resources 3 0 R >> + endobj # 3 0 obj << /ProcSet [ /PDF /Text /ImageB /ImageC /ImageI ] >> endobj # 4 0 obj << /Producer (PDF::API2 2.043 \(MSWin32\)) >> endobj # 5 0 obj << /Type /Page /Parent 2 0 R /Resources << /ProcSet [ /PDF / +Text /ImageB /ImageC /ImageI ] >> >> endobj # 6 0 obj << /Type /Outlines /Count 1 /First 7 0 R /Last 7 0 R >> endo +bj # 7 0 obj << /Dest (1) /Parent 6 0 R /Title (Test Outline 1) >> endobj # xref # 0 8 # 0000000000 65535 f # 0000000015 00000 n # 0000000159 00000 n # 0000000235 00000 n # 0000000304 00000 n # 0000000365 00000 n # 0000000477 00000 n # 0000000548 00000 n # trailer # << /Info 4 0 R /Root 1 0 R /Size 8 >> # startxref # 617 # %%EOF # 9 0 obj << /Parent 6 0 R >> endobj # xref # 0 1 # 0000000000 65535 f # 9 1 # 0000000852 00000 n # trailer # << /Info 4 0 R /Prev 617 /Root 1 0 R /Size 10 >> # startxref # 887 # %%EOF # ' # expected: '%PDF-1.4 # %&#9566;&#9552;&#9552;&#9569; # 1 0 obj << /Type /Catalog /PageMode /SinglePage /Pages 2 0 R /Viewer +Preferences << /NonFullScreenPageMode /UseNone >> >> endobj # 2 0 obj << /Type /Pages /Count 1 /Kids [ 5 0 R ] /Resources 3 0 R >> + endobj # 3 0 obj << /ProcSet [ /PDF /Text /ImageB /ImageC /ImageI ] >> endobj # 4 0 obj << /Producer (PDF::API2 2.043 \(MSWin32\)) >> endobj # 5 0 obj << /Type /Page /Parent 2 0 R /Resources << /ProcSet [ /PDF / +Text /ImageB /ImageC /ImageI ] >> >> endobj # xref # 0 6 # 0000000000 65535 f # 0000000015 00000 n # 0000000143 00000 n # 0000000219 00000 n # 0000000288 00000 n # 0000000349 00000 n # trailer # << /Info 4 0 R /Root 1 0 R /Size 6 >> # startxref # 461 # %%EOF # ' not ok 2 - Make sure the outline text actually got changed in the PDF # Failed test 'Make sure the outline text actually got changed in th +e PDF' # at pdf_test.pl line 47. # got: '%PDF-1.4 # %&#9566;&#9552;&#9552;&#9569; # 1 0 obj << /Type /Catalog /Outlines 6 0 R /PageMode /SinglePage /Pag +es 2 0 R /ViewerPreferences << /NonFullScreenPageMode /UseNone >> >> +endobj # 2 0 obj << /Type /Pages /Count 1 /Kids [ 5 0 R ] /Resources 3 0 R >> + endobj # 3 0 obj << /ProcSet [ /PDF /Text /ImageB /ImageC /ImageI ] >> endobj # 4 0 obj << /Producer (PDF::API2 2.043 \(MSWin32\)) >> endobj # 5 0 obj << /Type /Page /Parent 2 0 R /Resources << /ProcSet [ /PDF / +Text /ImageB /ImageC /ImageI ] >> >> endobj # 6 0 obj << /Type /Outlines /Count 1 /First 7 0 R /Last 7 0 R >> endo +bj # 7 0 obj << /Dest (1) /Parent 6 0 R /Title (Test Outline 1) >> endobj # xref # 0 8 # 0000000000 65535 f # 0000000015 00000 n # 0000000159 00000 n # 0000000235 00000 n # 0000000304 00000 n # 0000000365 00000 n # 0000000477 00000 n # 0000000548 00000 n # trailer # << /Info 4 0 R /Root 1 0 R /Size 8 >> # startxref # 617 # %%EOF # 9 0 obj << /Dest (1) /Parent 6 0 R /Title (Test Outline 2) >> endobj # xref # 0 1 # 0000000000 65535 f # 9 1 # 0000000852 00000 n # trailer # << /Info 4 0 R /Prev 617 /Root 1 0 R /Size 10 >> # startxref # 921 # %%EOF # ' # expected: '%PDF-1.4 # %&#9566;&#9552;&#9552;&#9569; # 1 0 obj << /Type /Catalog /Outlines 6 0 R /PageMode /SinglePage /Pag +es 2 0 R /ViewerPreferences << /NonFullScreenPageMode /UseNone >> >> +endobj # 2 0 obj << /Type /Pages /Count 1 /Kids [ 5 0 R ] /Resources 3 0 R >> + endobj # 3 0 obj << /ProcSet [ /PDF /Text /ImageB /ImageC /ImageI ] >> endobj # 4 0 obj << /Producer (PDF::API2 2.043 \(MSWin32\)) >> endobj # 5 0 obj << /Type /Page /Parent 2 0 R /Resources << /ProcSet [ /PDF / +Text /ImageB /ImageC /ImageI ] >> >> endobj # 6 0 obj << /Type /Outlines /Count 1 /First 7 0 R /Last 7 0 R >> endo +bj # 7 0 obj << /Dest (1) /Parent 6 0 R /Title (Test Outline 1) >> endobj # xref # 0 8 # 0000000000 65535 f # 0000000015 00000 n # 0000000159 00000 n # 0000000235 00000 n # 0000000304 00000 n # 0000000365 00000 n # 0000000477 00000 n # 0000000548 00000 n # trailer # << /Info 4 0 R /Root 1 0 R /Size 8 >> # startxref # 617 # %%EOF # ' # Looks like you failed 2 tests of 2.

Thank you for your time!

Replies are listed 'Best First'.
Re: PDF::API2 - issues with "delete()" and "title()" methods in Outline.pm
by vr (Curate) on Sep 03, 2022 at 13:54 UTC

    Update. PDF::API2 supports incremental PDF modification only, raw file size can only grow, comparing "snapshots" is useless. Modified objects must be marked as such, so they'll be "slapped" after original %%EOF on save. That, mostly, is what patch does (plus a few more things), all "setter" methods should include $self->{' api'}->{'pdf'}->out_obj($self);, and/or similar line if parent/siblings are modified. I stopped patching half way, now I see your particular intent was to change bookmark titles, then patch title setter too, it'll work then. Note PDF::API2::Outline::outline always adds a new descendant to invocant; PDF::API2::outline is different.

    Line added to "main" module API2.pm is lazy way to ensure objects are re-blessed, so that e.g. $pdf->outline->first->delete won't boom. Two lines commented-out in patch look harmless but useless to me: parent method never called as setter, deleting ' children' array achieves nothing. + See modified PDF content in OP, object number is skipped, it's just not tidy, I wonder if it always was this way. These latter comments are for maintainer (Steve?) if they are interested.

    __________

    (PDF::API2... patch and fix and patch again). Below there are patches for 2 files in distribution, and your modified test program which produces four PDF files with expected content I think. I'll add later to this node as to why your tests as designed make no sense and why it's only "demo" patch, I gave up patching further because out of time, only your immediate test files are fixed.

    --- C:\berrybrew\strawberry-perl-5.32.1.1-64bit-PDL\perl\site\lib\PDF\ +API2.pm.backup Wed Dec 8 07:53:45 2021 +++ C:\berrybrew\strawberry-perl-5.32.1.1-64bit-PDL\perl\site\lib\PDF\ +API2.pm Sat Sep 3 13:36:58 2022 @@ -824,6 +824,7 @@ bless $obj, 'PDF::API2::Outlines'; $obj->{' api'} = $self; weaken $obj->{' api'}; + $obj->count(); } else { $obj = PDF::API2::Outlines->new($self); --- C:\berrybrew\strawberry-perl-5.32.1.1-64bit-PDL\perl\site\lib\PDF\ +API2\Outline.pm.backup Wed Dec 8 07:53:45 2021 +++ C:\berrybrew\strawberry-perl-5.32.1.1-64bit-PDL\perl\site\lib\PDF\ +API2\Outline.pm Sat Sep 3 16:23:35 2022 @@ -90,25 +90,25 @@ if ($count) { $self->{'Count'} = PDFNum($self->is_open() ? $count : -$count +); } + else { + delete $self->{'Count'} + } return $count; } sub _load_children { my $self = shift(); + $self->{' children'} = []; my $item = $self->{'First'}; - return unless $item; - $item->realise(); - bless $item, __PACKAGE__; - - push @{$self->{' children'}}, $item; - while ($item->next()) { - $item = $item->next(); + + while ($item) { $item->realise(); bless $item, __PACKAGE__; + $item->{' api'} = $self->{' api'}; push @{$self->{' children'}}, $item; + $item = $item->next() } - return $self; } =head3 first @@ -121,8 +121,10 @@ sub first { my $self = shift(); - if (defined $self->{' children'} and defined $self->{' children'} +->[0]) { - $self->{'First'} = $self->{' children'}->[0]; + if (exists $self->{' children'}) { + $self->{'First'} = @{$self->{' children'}} + ? $self->{' children'}[0] + : undef } return $self->{'First'}; } @@ -137,8 +139,10 @@ sub last { my $self = shift(); - if (defined $self->{' children'} and defined $self->{' children'} +->[-1]) { - $self->{'Last'} = $self->{' children'}->[-1]; + if (exists $self->{' children'}) { + $self->{'Last'} = @{$self->{' children'}} + ? $self->{' children'}[-1] + : undef } return $self->{'Last'}; } @@ -154,7 +158,7 @@ sub parent { my $self = shift(); - $self->{'Parent'} = shift() if defined $_[0]; +# $self->{'Parent'} = shift() if defined $_[0]; return $self->{'Parent'}; } @@ -167,8 +171,11 @@ =cut sub prev { - my $self = shift(); - $self->{'Prev'} = shift() if defined $_[0]; + my ($self, $other) = @_; + if ($other) { + $self->{'Prev'} = $other; + $self->{' api'}{'pdf'}->out_obj($self); + } return $self->{'Prev'}; } @@ -181,8 +188,11 @@ =cut sub next { - my $self = shift(); - $self->{'Next'} = shift() if defined $_[0]; + my ($self, $other) = @_; + if ($other) { + $self->{'Next'} = $other; + $self->{' api'}{'pdf'}->out_obj($self); + } return $self->{'Next'}; } @@ -200,7 +210,9 @@ my $self = shift(); my $child = PDF::API2::Outline->new($self->{' api'}, $self); - $self->{' children'} //= []; + + $self->_load_children() unless exists $self->{' children'}; + $child->prev($self->{' children'}->[-1]) if @{$self->{' children' +}}; $self->{' children'}->[-1]->next($child) if @{$self->{' children' +}}; push @{$self->{' children'}}, $child; @@ -208,6 +220,7 @@ $self->{' api'}->{'pdf'}->new_obj($child); } + $self->{' api'}->{'pdf'}->out_obj($self); return $child; } @@ -268,6 +281,7 @@ $item = $item->next(); push @{$self->{' children'}}, $item; } + $self->{' api'}->{'pdf'}->out_obj($self); return $self; } @@ -291,7 +305,8 @@ my $siblings = $self->parent->{' children'}; @$siblings = grep { $_ ne $self } @$siblings; - delete $self->parent->{' children'} unless $self->parent->has_chi +ldren(); +# delete $self->parent->{' children'} unless $self->parent->has_ch +ildren(); + $self->{' api'}->{'pdf'}->out_obj($self->parent); return; }

    ############

    #!/usr/bin/perl use 5.032; use warnings; use strict; use PDF::API2; my ($stringy_bare_pdf, $stringy_outline_pdf); BARE_PDF: { my $pdf = PDF::API2->new(-compress => 0); my $page1 = $pdf->page(); $pdf->save('bare.pdf'); } OUTLINE: { my $pdf = PDF::API2->new(-compress => 0); my $page1 = $pdf->page(); my $outlines = $pdf->outlines(); my $outline = $outlines->outline(); $outline->title('Test Outline 1'); $outline->dest(1); $stringy_outline_pdf = $pdf->to_string(); open my $fh, '>', 'outline.pdf' or die; binmode $fh; print $fh $stringy_outline_pdf; close $fh; } DELETE_OUTLINE: { my $pdf = PDF::API2->from_string($stringy_outline_pdf, -compress = +> 0) or die $!; my $root = $pdf->outlines(); my $outline = $root->first(); $outline->delete(); $pdf->save('outline_deleted.pdf'); } MODIFY_OUTLINE: { my $pdf = PDF::API2->from_string($stringy_outline_pdf, -compress = +> 0) or die $!; my $root = $pdf->outlines(); my $outline = $root->outline(); $outline->title('Test Outline 2'); $outline->dest(1); $pdf->save('outline_modified.pdf'); }

      Thank you very much for the clarification and the patches

      So my code to rename and delete the outlines was correct, but at the end of the day, unpatched PDF::API2 version 2.043 does not currently support renaming or completely deleting existing outlines in a PDF.

        Example test relies on incorrect use of PDF::API2::Outline::outline despite its documentation, but code unshown may in fact be correct if bookmark tree is traversed with other (too verbose for example test?) -- i.e. First/Next/Prev/Last -- methods.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://11146650]
Approved by Athanasius
Front-paged by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others studying the Monastery: (3)
As of 2024-04-16 20:03 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found