|more useful options|
Re^9: Speeds vs functionality (utf8 csv)by tye (Sage)
|on Jul 31, 2014 at 21:12 UTC||Need Help??|
I was never considering single-byte anything. Writing code in Perl means that I don't have to (unlike writing code in XS). Yes, I actually meant what I said. Yes, I realized that your example was using multi-byte single-character tokens.
The reason that single-character vs. multi-character (usually) leads to different approaches is because [^"\\]+ as part of a regex works fine for those single-character quote and escape values (respectively) but isn't even close to what you have to do if either of those is multi-character.
And you are quite wrong about:
One glance at the source code and it's obvious the author doesn't mean single character; he means single byte.
For one, the author of Text::xSV didn't have to think about multi-byte characters. Their module is written in Perl so, unless they do something moderately strange or stupid, then multi-byte characters "just work" (provided the user of the module does the little bit of extra work to ensure that Perl has/will properly decode the strings/streams being given to the module).
Looking at the code for Text::xSV in some detail, I see that 90% of the uses of the separator character would work completely fine with a separator that is even composed of more than one multi-byte character. There is one important place where the code would break for a multi-character separator (but that, indeed, continues to work for a separator that is a single multi-byte character):
Now, fixing the unfortunate hard-coding of the quote character is probably quite a simple task. And that would probably be sufficient to make the module work fine on multi-byte quote characters. Certainly much easier than trying to get multi-byte character support into a much more complex XS module.
Because you haven't done the tiny bit of work to fix Text::xSV? Or the small amount of work to write a simple CSV parser in Perl?
No matter. I'm almost done writing my new CSV module.