Just another Perl shrine | |
PerlMonks |
Re: Chicanery Needed to Handle Unicode Text on Microsoft Windowsby kcott (Archbishop) |
on Oct 30, 2010 at 09:15 UTC ( [id://868437]=note: print w/replies, xml ) | Need Help?? |
As far as I can see, the only reason you need :crlf is because you've specifically added the UNIX line ending (\n) to your output. It would be better to use the platform-independent $/. The :raw layer should preserve the line endings. So that reduces the chicanery somewhat. Except for ASCII files, binmode($file_handle) was required on MSWin32 systems. :raw performs the same function so, while perhaps appearing to add to the chicanery, it certainly reduces the amount of code. I don't have sufficient knowledge of UTF-16 to address that aspect of you post. What I would suggest is that, after removing :crlf and changing \n to $/, you try your test code without :perlio. You may still need it but it wouldn't hurt to check. I agree there's a lot of Unicode-related documentation; however, everything I've made reference to is available here: PerlIO. I ran a series of tests, click on Read more... to view. Starting code:
Input files in UNIX and DOS formats:
Output after running on UNIX platform:
Output after running on DOS platform:
Changing the while loop to chomp input and add $/ (not \n) to output:
New output:
Adding :crlf to MSWin32 input and output modes (now = :raw:crlf) and there's no change:
With :raw:perlio:crlf, there's no change:
And, for completeness, with :raw:perlio, there's no change:
-- Ken
In Section
Seekers of Perl Wisdom
|
|