Beefy Boxes and Bandwidth Generously Provided by pair Networks
go ahead... be a heretic
 
PerlMonks  

Re: Regex to encode entities in XML

by mirod (Canon)
on Jun 11, 2001 at 10:22 UTC ( [id://87409]=note: print w/replies, xml ) Need Help??


in reply to Regex to encode entities in XML

Generating valid XML for the CB might actuallly be harder than it looks as I am not sure how easy it is to figure the encoding of the messages.

The problem you have might be a bug in XML::Parser: If I use the regexp and then HTML::Entities I get the proper result with XML::Parser 2.27 but the wrong one with XML::Parser 2.30 (it looks like characters loose their UTF-8'edness with the latter).

The solution is either to use Text::Iconv or the Unicode modules as described in my first post about encodings, or to go module lifting once again and to grab code from XML::DOM:

sub safe_encode { my $str= shift; $str =~ s{([\xC0-\xDF].|[\xE0-\xEF]..|[\xF0-\xFF]...)} {XmlUtf8Decode ($1)}egs; return $str; } sub XmlUtf8Decode { my ($str, $hex) = @_; my $len = length ($str); my $n; if ($len == 2) { my @n = unpack "C2", $str; $n = (($n[0] & 0x3f) << 6) + ($n[1] & 0x3f); } elsif ($len == 3) { my @n = unpack "C3", $str; $n = (($n[0] & 0x1f) << 12) + (($n[1] & 0x3f) << 6) + ($n[2] & 0 +x3f); } elsif ($len == 4) { my @n = unpack "C4", $str; $n = (($n[0] & 0x0f) << 18) + (($n[1] & 0x3f) << 12) + (($n[2] & 0x3f) << 6) + ($n[3] & 0x3f); } elsif ($len == 1) # just to be complete... { $n = ord ($str); } else { die "bad value [$str] for XmlUtf8Decode"; } $hex ? sprintf ("&#x%x;", $n) : "&#$n;"; }

This will encode all non-ascii characters as &#nnn; where nnn is the code of the character in Unicode. This seems to display properly at least in Opera on Linux.

Let me know if this solves your problem.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://87409]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others contemplating the Monastery: (4)
As of 2024-04-19 18:52 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found