Beefy Boxes and Bandwidth Generously Provided by pair Networks
"be consistent"
 
PerlMonks  

XML::Parser and numeric entities

by gam3 (Curate)
on Jan 14, 2010 at 01:32 UTC ( [id://817319]=perlquestion: print w/replies, xml ) Need Help??

gam3 has asked for the wisdom of the Perl Monks concerning the following question:

Is there a way to keep XML::Parser from converting numeric entities into UTF8?

Or is there some other parser that will let me do this?

use strict; use XML::Parser; use vars qw($parser); sub handle_start { my $self = shift; my $x = shift; print "<" . $x . '>' ; } sub handle_end { my $self = shift; my $x = shift; print "</" . $x . '>' ; } sub handle_char { my $self = shift; my $x = shift; print $x; } $parser = XML::Parser->new( Handlers => { Start => \&handle_start, End => \&handle_end, Char => \&handle_char } ); $parser->parse(<<XML); <start>&#8211;</start> XML
I would like this program to output
<start>&#8211;</start>
not
<start>–</start>
-- gam3
A picture is worth a thousand words, but takes 200K.

Replies are listed 'Best First'.
Re: XML::Parser and numeric entities
by ikegami (Patriarch) on Jan 14, 2010 at 02:53 UTC

    It simply decodes the entities. It doesn't then encode the character using UTF-8.

    If you want all non-ASCII characters encoded, you can use:

    use HTML::Entities qw( encode_entities_numeric ); sub handle_char { my $self = shift; my $x = shift; print encode_entities_numeric($x); }

    There's also a handler you can use instead of Char that receives the entities still encoded, but then you're not guaranteed to have all non-ASCII characters encoded.

      Thank you for that information, I can use it to patch up my problem

      However what I really want is for XML::Parser to NOT decode the numeric entities at all.

      -- gam3
      A picture is worth a thousand words, but takes 200K.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://817319]
Approved by herveus
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others studying the Monastery: (8)
As of 2024-03-28 11:58 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found