http://qs321.pair.com?node_id=1095668


in reply to Re^3: Speeds vs functionality
in thread Speeds vs functionality

I believe Modern Perl should have a core module that can easily parse these simple Unicode CSV records. It should handle them in any character encoding scheme of Unicode:  UTF-8, UTF-16, or UTF-32. And it should handle the Unicode byte order mark seamlessly.

Why not?

🎥Film🎥🎬🎥Year🎥🎬🎥Awards🎥🎬🎥Nominations🎥🎬🎥Director🎥
🎥12 Years a Slave🎥🎬2013🎬3🎬9🎬🎥🎥🎥 Steve McQueen🎥
🎥Argo🎥🎬2012🎬3🎬7🎬🎥🎥🎥 Ben Affleck🎥
🎥The Artist🎥🎬2012🎬5🎬10🎬🎥🎥🎥 Michel Hazanavicius🎥
🎥The King's Speech🎥🎬2010🎬4🎬12🎬🎥🎥🎥 Tom Hooper🎥
🎥The Hurt Locker🎥🎬2009🎬6🎬9🎬🎥🎥🎥 Kathryn Bigelow🎥
🎥Slumdog Millionaire🎥🎬2008🎬8🎬10🎬🎥🎥🎥 Danny Boyle🎥
🎥No Country for Old Men🎥🎬2007🎬4🎬8🎬🎥🎥🎥 Joel Coen
🎥🎥 Ethan Coen🎥
🎥The Departed🎥🎬2006🎬4🎬5🎬🎥🎥🎥 Martin Scorsese🎥

sep_char	🎬	U+1F3AC CLAPPER BOARD (UTF-8: F0 9F 8E AC)
quote_char	🎥	U+1F3A5 MOVIE CAMERA  (UTF-8: F0 9F 8E A5)
escape_char	🎥	U+1F3A5 MOVIE CAMERA  (UTF-8: F0 9F 8E A5)
"Film","Year","Awards","Nominations","Director"
"12 Years a Slave",2013,3,9,"🎥 Steve McQueen"
"Argo",2012,3,7,"🎥 Ben Affleck"
"The Artist",2012,5,10,"🎥 Michel Hazanavicius"
"The King's Speech",2010,4,12,"🎥 Tom Hooper"
"The Hurt Locker",2009,6,9,"🎥 Kathryn Bigelow"
"Slumdog Millionaire",2008,8,10,"🎥 Danny Boyle"
"No Country for Old Men",2007,4,8,"🎥 Joel Coen
🎥 Ethan Coen"
"The Departed",2006,4,5,"🎥 Martin Scorsese"

I recognize that the current XS core module for parsing CSV records, Text::CSV_XS (marvelously maintained by Tux), may not be the right module to use as the basis for a new, fully Unicode-capable module. But because Perl's native Unicode capabilities exceed those of most other programming languages, Perl should have a proper FSM-based Unicode CSV parser, even if it's pure Perl and not XS.

I long ago accepted that Unicode conformance and comparative slowness go hand in hand 👫. So what? Look what you're trading a few seconds here and there for:  the technological foundation of World Peace ☮ and Universal Love 💕.

UPDATE:  Removed references to core module. I don't care about that. I just want a Unicode-capable Perl CSV module.