Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl: the Markov chain saw
 
PerlMonks  

Bug in Perl 5.6.1 ?

by gregorovius (Friar)
on Dec 08, 2001 at 19:39 UTC ( [id://130419]=perlquestion: print w/replies, xml ) Need Help??

gregorovius has asked for the wisdom of the Perl Monks concerning the following question:

The following code, when ran under Perl 5.6.0 and earlier, will correctly print the latin1 variable in the second print statement. Under Perl 5.6.1 this variable misteriously reverts to UTF8 and its german characters are printed as the two characters in the UTF8 encoding, and look like gibberish. The first print statement shows that interpolating utf8 together with latin1 variables works under perl 5.6.1, so the utf8 variable gotten from XML::RSS must be special in a bad way that makes perl 5.6.1 confused. XML::RSS uses XML::Parser to parse XML so the problem could also be in XML::Parser.

Note that if I converted the $utf8_from_xml_rss variable to latin1 the print statement would work fine.

This almost caused me a headache last week when a client in a different continent and behind a firewall was reporting that a program my company supplied was printing gibberish, and we couldn't reproduce it.

The output produced by the program follows right after it.

#!/usr/bin/perl -w use strict; use Unicode::String; use XML::RSS; my $latin1 = "Größter Anstieg seit März 1998"; my $utf8 = Unicode::String::latin1( $latin1 )->utf8; print "1: $utf8 - $latin1 \n"; my $rss_content = <<'EOF'; <?xml version="1.0" encoding="ISO-8859-1" ?><rss version="0.91"><chann +el><item> <title>Größter Anstieg seit März 1998</title></item></ch +annel></rss> EOF my $rss = new XML::RSS; $rss->parse( $rss_content ); foreach my $item ( @{ $rss->{items} } ) { # XML::RSS always returns its findings in UTF8 my $utf8_from_xml_rss = $item->{title}; print "2: $utf8_from_xml_rss - $latin1 \n"; } # under Perl 5.6.0 and earlier the output is: # 1: GröÃter Anstieg seit März 1998 - Größter Anstieg seit März 1998 + # 2: GröÃter Anstieg seit März 1998 - Größter Anstieg seit März 1998 + # under Perl 5.6.1 the output is # 1: GröÃter Anstieg seit März 1998 - Größter Anstieg seit März 1998 + # 2: GröÃter Anstieg seit März 1998 - GröÃter Anstieg seit März 19 +98

Replies are listed 'Best First'.
Re: Bug in Perl 5.6.1 ?
by jlongino (Parson) on Dec 08, 2001 at 23:40 UTC
    I'm running the following:
    This is perl, v5.6.1 built for MSWin32-x86-multi-thread Binary build 628 provided by ActiveState Tool Corp. http://www.ActiveS +tate.com Built 15:41:05 Jul 4 2001 Unicode-String [2.06 ] String of Unicode characters (UCS2/UTF16) XML-RSS [0.95 ] creates and updates RSS files XML-Parser [2.27 ] A Perl module for parsing XML documents
    These are the results I got (a,b) from your program compared to your 5.6.0 results (1,2):
    1: GröÃter Anstieg seit März 1998 - Größter Anstieg seit März 1998 2: GröÃter Anstieg seit März 1998 - Größter Anstieg seit März 1998 a: Größter Anstieg seit März 1998 - Größter Anstieg seit März 1998 b: Größter Anstieg seit März 1998 - Größter Anstieg seit März 1998
    Note the one extra character in each. Adding  use locale; made no difference. I've looked closely for cut/paste errors but don't see any. Think the module versions have anything to do with it?

    --Jim

      Ah, your results look OK. The problem may happen only under 5.6.1 for Linux. Note that the string on the left of the dash is not supposed to appear right, as your terminal expects latin1 and that is UTF8. The one difference I see is that UTF8 chars are rendered as 3 characters in your system.

      This is what I'm running:

      This is perl, v5.6.1 built for i686-linux Copyright 1987-2001, Larry Wall XML::RSS v0.97 XML::Parser v2.30
Re: Bug in Perl 5.6.1 ?
by dws (Chancellor) on Dec 08, 2001 at 22:29 UTC
    I don't see   use locale; in your code, and am wondering if that might make a difference in behavior.

      Nope, it doesn't...

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://130419]
Approved by root
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others romping around the Monastery: (4)
As of 2024-04-19 15:46 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found