Beefy Boxes and Bandwidth Generously Provided by pair Networks
Do you know where your variables are?
 
PerlMonks  

Need advice on HTML entities

by wfsp (Abbot)
on Feb 07, 2005 at 17:14 UTC ( [id://428749]=perlquestion: print w/replies, xml ) Need Help??

wfsp has asked for the wisdom of the Perl Monks concerning the following question:

I've got into a tangle using HTML::Entities v 1.27

Using WinXP, Activestate 5.8.6 build 811

Any ideas on what I'm doing wrong?

#!/usr/bin/perl use strict; use warnings; use HTML::Entities; my $string = "’"; print "string: $string\n"; # &rsquo decode_entities($string); print "de entitied: $string\n"; # ’ print "ord of string:" . ord($string) . "\n"; # 226 encode_entities($string); print "re entitied: $string\n"; # â€&#153 my $character = encode_entities(chr(226)); print "character 226: $character\n"; # &acirc $character = encode_entities(chr(8217)); print "character 8217: $character\n"; # &rsquo

I think I need decode_entities to return char(8217) and not char(226)

Thanks in advance

Replies are listed 'Best First'.
Re: Need advice on HTML entities
by Tanktalus (Canon) on Feb 07, 2005 at 17:24 UTC

    Works better here ... although I'm getting a message of "Wide character in print at z line 10." Because perl is converting this to UTF8 successfully, the ord is coming out as 8217. The 226 you're getting is simply the first character in a multi-byte character because your perl is treating the multibyte character as a series of single byte characters.

    You may try something like this:

    C:\> set LANG=en_US.UTF-8 C:\> perl html_entities_test.pl
    I'm not sure how well it'll work for you, though.

Re: Need advice on HTML entities
by borisz (Canon) on Feb 07, 2005 at 17:39 UTC
    You could use
    HTML::Entities::decode_entities_old($string)
    thats the old perl fallback. Or make your string utf8 before you pass it to decode_entities.
    my $string = "’"; chop ( $string .= chr(0x1234)); print "string: $string\n"; # &rsquo decode_entities($string);
    and since we like to print a utf8 char put STDOUT into utf8 mode too.
    # at the top of the program binmode STDOUT, ":utf8";
    UPDATE: or even better update HTML::Parser and it works out of the box. Tested with 3.43.
    Boris
      ...update HTML::Parser...

      That did it! Many thanks,
      John

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://428749]
Approved by holli
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others having a coffee break in the Monastery: (8)
As of 2024-04-23 12:42 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found