Perl-Sensitive Sunglasses | |
PerlMonks |
Re^3: substr on UTF-8 stringsby haukex (Archbishop) |
on Jun 24, 2020 at 12:46 UTC ( [id://11118421]=note: print w/replies, xml ) | Need Help?? |
But then a Unicode character comes up, and suddenly writing text to stdout produces garbage characters and Perl issues a warning about it. Add use open qw/:std :utf8/; at the top of your code to open STDIN/OUT/ERR as UTF-8 (assuming your console is UTF-8). I verify that strings are flagged as native or as UTF-8 at the places where they should be. You should only be checking the UTF8 flag for debugging purposes when you find problems with your code, and not assuming what its state should be - it's an internal flag that can (and will) change across Perl versions. Perl will then have a mixture of 'native' and 'UTF-8' strings to concatenate. How does that work? Even if there are no characters above 127, Perl will have to scan all 'native' strings, if only to issue a warning for high characters. Is that right? If all strings were flagged as UTF8, concatenation should be faster, shouldn't it? I think you're worrying too much about the internals here. Perl generally does the right thing; you should only worry about it if you actually have problems with your code (write tests to check the input and output of your code), and you should only worry about speed if it becomes an issue for you. In general, for the best Unicode support, use the latest version of Perl (5.26 is pretty good, but think about upgrading using e.g. perlbrew), encode your source files as UTF-8, and start them off like this:
And make sure to always specify the correct encoding when opening files ("open" Best Practices). If you have problems, feel free to post them here, see also my advice on that here. I think I will code the removal of trailing slashes with a regular expression, as that should respect the flag. See the core module File::Spec for how to do operations on filenames in a portable way. Update: Added "use warnings FATAL => 'utf8';"
In Section
Seekers of Perl Wisdom
|
|