more useful options | |
PerlMonks |
Re^4: substr on UTF-8 stringsby rdiez (Acolyte) |
on Jun 24, 2020 at 13:14 UTC ( [id://11118427]=note: print w/replies, xml ) | Need Help?? |
Add use open qw/:std :utf8/; at the top of your code to open STDIN/OUT/ERR as UTF-8 (assuming your console is UTF-8). Yes, that is the usual advice. But it is wrong in practice, in my opinion. First of all, who knows where my script, or parts of it, will land. Maybe on Windows. I did write once a Perl script that I was using heavily on Windows. Why should I assume that my console is using UTF-8? But most importantly, if you automatically open all files in UTF-8, then you'll have serious limitations. Say a file has an invalid UTF-8 sequence. What will Perl do? Die on read? Or just write a warning on stderr, so that the script will never know? Such warnings do not really help the end user. If you tell Perl not to check UTF-8 for validity on the file, will it really not check? Perl is internally doing many string operations, one of them may suddenly write a warning to stderr. What if you do want to check for UTF-8 encoding errors? What if your file mixes binary and UTF-8? Life is not that simple. In my current script, I am reading a UTF-8 text file. I am opening the file in "raw" mode, and decoding every text line myself. This way, when a line has UTF-8 encoding errors, my script can cleanly tell the user what file number the error is in. You cannot do that if you let Perl I/O handle things magically. See the core module File::Spec for how to do operations on filenames in a portable way. File::Spec is a disaster. Just try this:
This is what you get:
It does not collapse multiple '/' separators like POSIX says you should do. It adds empty directories before the first / and after the last /. That weird behavior is not documented (or did I miss it?) That does not really help. You have to do everything manually if you want the job to be done properly. It is actually a shame, because I really like Perl.
In Section
Seekers of Perl Wisdom
|
|