PDF::API2 printing non ascii characters

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: PDF::API2 printing non ascii characters by vr (Curate) on Mar 13, 2018 at 12:38 UTC
Started writing this before thanos1983 update (use of 'DejaVuSans.ttf' as an OK modern font, too :) ), so, for FWIW: The "core" Helvetica font uses single-byte built-in encoding, which doesn't have greek characters. In fact, in modern times it is not advised to use any of Adobe 14 "core", not-to-be-embedded fonts, they belong to the era of 20+ years ago, when storage space was at a premium. Even if you think that you produce (and consume) PDFs in very controlled, ascii-only environment. That's said, the "core" font which contains greek and other math characters is called 'Symbol'. You give normal, utf8 Perl strings as arguments to `PDF::API2` methods, everything will be encoded for you automatically. use strict; use warnings; use utf8; use PDF::API2; my $pdf = PDF::API2-> new; my $page = $pdf-> page; my $text = $page-> text; my $core_font = $pdf-> corefont( 'Symbol' ); $text-> font( $core_font, 20 ); $text-> translate( 50, 700 ); $text-> text( 'ω ∞' ); $pdf-> saveas( 'test.pdf' ); The output is a 5 KB file, which, in addition to necessary overhead, contains a lot of bloat. `PDF::API2` doesn't do it quite optimal with "core" fonts. Let's insert this before last line: delete @$core_font{ qw/ Encoding FirstChar LastChar Name Widths /}; The output is 986 bytes. The bad part, however, is that, while PDF looks OK on-screen, text extraction (e.g. copy-paste to Notepad), in both cases above, is broken when I check with Adobe Reader DC (i.e. latest) -- garbage is copied. Maybe Adobe doesn't care about "core" Symbol any more. However, both Firefox and Edge extract greek symbols correctly. The right way is to use embeddable, modern, having good Unicode support i.e. large code-points repertoire, TrueType fonts. Again, give "utf8 Perl strings as arguments to `PDF::API2` methods, everything will be encoded for you automatically". use strict; use warnings; use feature 'say'; use utf8; use PDF::API2; my $pdf = PDF::API2-> new; my $page = $pdf-> page; my $text = $page-> text; my $ttf_font = $pdf-> ttfont( 'DejaVuSans.ttf' ); $text-> font( $ttf_font, 20 ); $text-> translate( 50, 700 ); $text-> text( 'ω ∞ latin אב хчп юя' ); $pdf-> saveas( 'test.pdf' ); Here the text string sports greek, (extended-)latin, hebrew and cyrillic characters. It displays OK on-screen and text can be extracted even with backward Reader DC. File size is 55 KB, however.	[reply] [d/l] [select]
Re: PDF::API2 printing non ascii characters by thanos1983 (Parson) on Mar 13, 2018 at 10:29 UTC
Hello Anonymous Monk, One possible way could be with HTML::Entities. Sample of code: #!/usr/bin/perl use strict; use warnings; use HTML::Entities; use open ':std', ':encoding(UTF-8)'; my $html = "Character one: ω character two: ∞"; print decode_entities($html), "\n"; __END__ $ perl test.pl Character one: ω character two: ∞ Update: Adding complete answer. Sample of code from PDF::API2 / unicode characters. The solution to your problem is to add the appropriate font method. From the documentation PDF::API2/FONT_METHODS: `FONT METHODS @directories = PDF::API2::addFontDirs($dir1, $dir2, ...) Adds one or more directories to the search path for finding font files +. Returns the list of searched directories. $font = $pdf->corefont($fontname, [%options]) Returns a new Adobe core font object.` [download] In my sample of code I only use one but if you follow the documentation you can add more. I downloaded the fonts from Fonts by DejaVu Fonts. Sample of working code: `#!/usr/bin/perl use strict; use warnings; use PDF::API2; use HTML::Entities; # Create a blank PDF file my $pdf = PDF::API2->new(); # Add a blank page my $page = $pdf->page(); my $font = $pdf->ttfont('DejaVuSans.ttf'); # Add some text to the page my $text = $page->text(); $text->font($font, 20); $text->translate(80, 710); my $html = "Character one: ω character two: &#8734"; my $decoded_string = decode_entities($html); $text->text($decoded_string); # Save the PDF $pdf->saveas('test.pdf');` [download] Let us know if this works for you. BR / Thanos. Seeking for Perl wisdom...on the process of learning...not there...yet!	[reply] [d/l] [select]
Re^2: PDF::API2 printing non ascii characters by Anonymous Monk on Mar 13, 2018 at 15:08 UTC
What if the submitted html input is "%CF%89%20%E2%88%9E" (ω ∞) instead of the numeric codes below? `ω ∞` [download] How do I decode that before handing over to the pdf text method?	[reply] [d/l]
Re^3: PDF::API2 printing non ascii characters by thanos1983 (Parson) on Mar 13, 2018 at 15:40 UTC
Hello again Anonymous Monk, In this case you can use URI::Escape. See sample bellow: #!/usr/bin/perl use strict; use warnings; use URI::Escape; use feature 'say'; my $str = "Character one: ω character two: ∞"; my $hex_code = uri_escape( $str ); say $hex_code; my $string = uri_unescape( $hex_code ); say $string; __END__ $ perl test.pl Character%20one%3A%20%CF%89%20character%20two%3A%20%E2%88%9E Character one: ω character two: ∞ Hope this helps, BR. Seeking for Perl wisdom...on the process of learning...not there...yet!	[reply] [d/l]
Re^4: PDF::API2 printing non ascii characters by vr (Curate) on Mar 13, 2018 at 16:21 UTC
Re^5: PDF::API2 printing non ascii characters by thanos1983 (Parson) on Mar 14, 2018 at 12:16 UTC
Re^5: PDF::API2 printing non ascii characters by Anonymous Monk on Mar 13, 2018 at 16:38 UTC
Re^4: PDF::API2 printing non ascii characters by Anonymous Monk on Mar 13, 2018 at 16:22 UTC
Re^5: PDF::API2 printing non ascii characters by thanos1983 (Parson) on Mar 14, 2018 at 09:44 UTC
Some notes below your chosen depth have not been shown here
Re^2: PDF::API2 printing non ascii characters by Anonymous Monk on Mar 13, 2018 at 13:09 UTC
It works (bouncing!!!) Thank you so much!!!	[reply]
Re: PDF::API2 printing non ascii characters by ablanke (Monsignor) on Mar 13, 2018 at 12:40 UTC
Hi, take a look at this thread: PDF::API2 / unicode characters. You can only print glyphs/characters which are provided by the font. If your font has these glyphs please check if you have to convert the character encoding	[reply]


Don't ask to ask, just ask
	PerlMonks