Hi
Hmm in that case I think I misunderstood your problem. Though I still think you should use some XML technology ;-) if you are doing simple substitutions, could you do it using XSLT?
However, perhaps your problem is not with XML representations but with reading Unicode in. Assuming you're using Perl v5.8-v5.10, how are you opening the file? You need to tell Perl the encoding - presumably UTF-8.
You can do this in a number of ways:
# use binmode on the filehandle
open my $fh, '<', "file" or die "... $!";
binmode $fh, ':utf8';
# open $fh for reading UTF-8
open(my $fh, "<:encoding(UTF-8)", "file") or die "... $!";
# Use the open pragma to open all input files as UTF-8
# see http://perldoc.perl.org/open.html
use open IN => ':utf8';
# or you can manually use ...
$str = decode_utf8( $str );
# on each data item
In your case, easiest to use binmode on the filehandle - at least to find out if this is the problem.
There are many documents trying to explain unicode in Perl. I quite like this one. Be aware that unicode support and the surrounding issues have changed quite a lot with the versions. v5.6 is completely different to the above, for example.
FalseVinylShrub
Disclaimer: Please review and test code, and use at your own risk... If I answer a question, I would like to hear if and how you solved your problem.
|