Beefy Boxes and Bandwidth Generously Provided by pair Networks
No such thing as a small change
 
PerlMonks  

Re^2: UTF8 Output with XML::Feed? (use utf8)

by mldvx4 (Friar)
on Mar 07, 2022 at 17:55 UTC ( [id://11141890] : note . print w/replies, xml ) Need Help??


in reply to Re: UTF8 Output with XML::Feed? (use utf8)
in thread UTF8 Output with XML::Feed?

Thanks. Adding use utf8; was one of the first things I tried. I've also tried opening stdout as :utf8 but that doesn't help either. Adding an additional print() shows that the script itself is handling UTF8, or at least looks like it is, but XML::Feed seems not to.

#!/usr/bin/perl use utf8; use open ':encoding(utf8)'; use XML::Feed; use English; use strict; use warnings; my $d='Feed from a to '; my $t='abc...'; my $feed = XML::Feed->new('RSS'); $feed->title('Feed'); $feed->link('https://www.example.com/feed.rss'); $feed->language('en'); $feed->description($d); my $entry = XML::Feed::Entry->new(); $entry->link('https://www.example.com/one.html'); $entry->title($t); $feed->add_entry($entry); print "Description: $d\n"; print "Title: $t\n"; print $feed->as_xml; exit(0)

For what it's worth, the following appears to produce only a blank line.

#!/usr/bin/perl use utf8; print "\N{LATIN SMALL LETTER A WITH RING ABOVE}\n";

The terminal is xfce4-terminal 0.8.10 (Xfce 4.16) and set to use UTF-8. Pressing the keys "" appear to show the right characters.

Replies are listed 'Best First'.
Re^3: UTF8 Output with XML::Feed? (use utf8)
by kcott (Archbishop) on Mar 07, 2022 at 18:47 UTC
    "For what it's worth, the following appears to produce only a blank line."

    Please go back and (re)read the utf8 documentation; paying particular attention to the very clear and emboldened directive:

    Do not use this pragma for anything else than telling Perl that your script is written in UTF-8.

    The code you presented only contains 7-bit ASCII characters.

    You got what appeared to be a blank line. Here are some things you could have tried:

    $ perl -e 'print "\N{LATIN SMALL LETTER A WITH RING ABOVE}\n";' $ perl -e 'print "|\N{LATIN SMALL LETTER A WITH RING ABOVE}|\n";' | | $ perl -C -e 'print "\N{LATIN SMALL LETTER A WITH RING ABOVE}\n";' $ perl -e 'use open OUT => qw{:encoding(UTF-8) :std}; print "\N{LATIN +SMALL LETTER A WITH RING ABOVE}\n";'

    See: perlrun for -C; and, the open pragma.

    — Ken

      > The code you presented only contains 7-bit ASCII characters.

      erm ... ???

      update

      > Do not use this pragma for anything else than telling Perl that your script is written in UTF-8.

      Unfortunately this line is easily misunderstood. I recently had a long dispute with a camel award winner who read it wrongly.

      Many think it only means you can use unicode characters for identifiers, like $mhre or sub ne but it covers also literal strings read thru the same file-handle DATA.

      Please note how the UTF8 flag is set for $t2 (see FLAGS)

      use v5.12; use warnings; use Devel::Peek; my $t1=''; Dump $t1; use utf8; my $t2=''; Dump $t2; my $t3 = "\N{LATIN SMALL LETTER A WITH RING ABOVE}\n"; say $t3; Dump $t3;
      OUTPUT:
      SV = PV(0xd9ae08) at 0x25809b0 REFCNT = 1 FLAGS = (POK,IsCOW,pPOK) PV = 0x260a4e8 "\303\245\303\244\303\266"\0 CUR = 6 LEN = 10 COW_REFCNT = 1 SV = PV(0xd9add8) at 0x2580248 REFCNT = 1 FLAGS = (POK,IsCOW,pPOK,UTF8) PV = 0x260a068 "\303\245\303\244\303\266"\0 [UTF8 "\x{e5}\x{e4}\x{f6 +}"] CUR = 6 LEN = 10 COW_REFCNT = 1 SV = PV(0xd9afe8) at 0x2580a40 REFCNT = 1 FLAGS = (POK,IsCOW,pPOK,UTF8) PV = 0x2767378 "\303\245\n"\0 [UTF8 "\x{e5}\n"] CUR = 3 LEN = 10 COW_REFCNT = 1

      UPDATE2

      extended the code with $t3, which doesn't print an empty line for me but

      UPDATE3

      of course, how the print is displayed depends also on the output channel and the display settings.

      Cheers Rolf
      (addicted to the Perl Programming Language :)
      Wikisyntax for the Monastery

        erm ... ???

        I had the same thought on my first reading of that post, and even send a /msg to that effect.

        But then I reread kcott's post, and saw that the second block of code from the earlier post was the code kcott focused on, and was presumably the code that kcott said didn't need use utf8; -- which seems right, because it doesn't contain non-ASCII characters.

        G'day Rolf,

        I wrote my post (to which you replied) very early this morning ... then there was $work ... now it's my lunchtime and I see your original question has been resolved (with some input from pryrt). So, there's probably little more for me to say about that specifically.

        Thanks for all of the extra work you did: Devel::Peek and so on.

        By the way, I got a huge laugh from your latest user image. Thanks for that also.

        — Ken

Re^3: UTF8 Output with XML::Feed? (use utf8)
by LanX (Saint) on Mar 07, 2022 at 20:02 UTC
    I can't comment on XML::Feed, sorry.

    But ...

    > Adding use utf8; was one of the first things I tried.

    ... if your source-code is in utf8 (check your editor settings) and you have a line like my $t='abc...'; you must apply use utf8;

    Otherwise Perl will not know how to decode the bytes in that string, because the interpretation is not obvious.

    You should clarify this, before meddling with XML.

    Here a demo you should run:

    use v5.12; use warnings; use Data::Dump; my $t1=''; ddx $t1; say "length: ",length $t1; use utf8; my $t2=''; ddx $t2; say "length: ",length $t2;
    OUTPUT:
    # demo_utf8.pl:8: "\xC3\xA5\xC3\xA4\xC3\xB6" <-- bytes length: 6 # demo_utf8.pl:14: "\xE5\xE4\xF6" <-- code p +oints length: 3

    Cheers Rolf
    (addicted to the Perl Programming Language :)
    Wikisyntax for the Monastery