http://qs321.pair.com?node_id=198189


in reply to Handling Mac, Unix, Win/DOS newlines at readtime...

Good Monks!

I'd think that using \r or \n in this script should be considered harmful. If you'd really want to be portable (which might not be the case) you should exclusively use \015 and \012 to match DOS/Linux/Mac CRLF's

Imagine this script to be run on some box using EBCDIC (I know this is most probably a hypothetic assumtion but, oh well, I just want to demonstrate something here...)

Read perlebcdic. I has an example saying:

$is_ebcdic_37 = "\n" eq chr(37); $is_ebcdic_1047 = "\n" eq chr(21);

Uh-Oh... That means if you'd split on an EBCDIC system's perl on \n, you'd actually split on '%' or NAK respectively.

You really want to use HTML::Parser (or even XML::Parser) to parse your input. At least do something like this (untested):

@lines = split /\012\015?|\015\012?/, $file;

Whatever approach you'll choose, input normalization is not really a trivial problem...

So long,
Flexx