sub encode
{
my($text) = @_;
$text =~ s{([\xc0-\xc3])(.)}{
my $hi = ord($1);
my $lo = ord($2);
chr((($hi & 0x03) <<6) | ($lo & 0x3F))
}ge;
return $text;
}
--
Check out my Perlmonks Related Scripts like framechat,
reputer, and xNN.
| [reply] [d/l] |
I need a regexp that will convert high-bit characters (eg. > \0377 ?) to the appropriate character entity (eg. &x123;).
No, you do not need a regexp that will do that. There's a very nice module that does HTML entities: HTML::Entities.
When I'm in a hurry, I often use s/(\W)/'&#' . ord($1) . ';'/g for dumping data, because it's so easy to convert it back to the original, and encoding printable \W characters doesn't hurt.
- Yes, I reinvent wheels.
- Spam: Visit eurotraQ.
| [reply] [d/l] |
I have written a little script that does something similar. I started from HTML::Entities, but since I use UTF-8 for storing my documents, and that module supposes ISO-8859-1, it didn't work.
So I converted the hash from the module (chars to entity names) in UTF-8 using iconv, added a switch, and it now works. You can find it served from my PC (it's a DynDNS name, so sometimes may be off-line)
| [reply] |