Re: Semicolon delimited to Comma delimited
by Tux (Canon) on Apr 23, 2015 at 12:29 UTC
|
Disregarding the Microsoft issues with localized list-separator characters, as described here, and given that you already have exported to CSV, be it with another separator than the required ,, the solution is extremely simple, using Text::CSV_XS:
$ cat bad.csv
a;b;c
1;2;3
;4;
"Føó";"Hēłŀõ Ẇőŕļđ";
$ perl -C3 -MText::CSV_XS=csv -we'csv (in => "bad.csv", sep => ";")'
a,b,c
1,2,3
,4,
Føó,"Hēłŀõ Ẇőŕļđ",
Enjoy, Have FUN! H.Merijn
| [reply] [d/l] |
|
| [reply] |
|
Interesting; I'd missed this when I first read the thread.. I've found CSV to be unreliable as a copy/paste source, and generally use TSV which is astoundingly reliable in Excel.
Doesn't solve your problem, of course; you still need to know which semicolons to convert to tabs or commas and so this hasn't helped you at all -- just a note from a fairly heavy Excel user, in case it helps you in the future.
| [reply] |
Re: Semicolon delimited to Comma delimited
by ww (Archbishop) on Apr 23, 2015 at 11:44 UTC
|
Not directly re Win32::OLE (the use of which seems to me [YMMV] "the hard way to achieve your results"), but have you read perldoc perlvar (from your CLI)? It describes the List Separator.
You can read about the Decimal Separator in perldoc perldata.
Converting commas to semicolons (and vice versa) is a trivial job with regexen; making sure that you don't convert non-delimiting instances of either is something your can read about in the results of an appropriate Super Search.
As always, the local value system recommends trying to find answers on your own rather than a posting a question in the form of yours... and rejecting use of well-established (tried-and-true) modules is not likely to enhance the response.
Update: Forgot to mention: Super_Search can also find nodes|threads like Thousand and decimal separators with Spreadsheet::WriteExcel.
| [reply] [d/l] [select] |
|
Thank you for your kind words! I would not be so dumb as to not thoroughly search before i post a question(of course there is always the 2% probability of missing out on some hint or answer given somewhere). If i were looking for Spreadsheet::WriteExcel, I would not post the question either. But if i were to change all my 10 modules written using/for OLE to Spreadsheet just because i could not figure out how to turn a semi colon to comma, well that would be a bummer. Even Google could not help me with this question except for that i could use Text::CSV. I am not completely rejecting the idea of using a new package but if there were a way to do it without using the package it would be betetr for me. Thousand and decimal separators is not a prio for me more than turning a semicolon to comma! I was told by a wise monk not to try split for *.csv files.
| [reply] |
|
| [reply] |
|
Actually, despite my previous comment about split / join, when I wrote my own module to handle CSV, I stepped back a generation and went for character-by-character parsing, and the use of state flags to track things like quote encapsulation and line completion.
The Text::CSV module which existed at the time was grossly underpowered and did not work on some of the simplest CSV files I'd exported from Excel. (I would be remiss to fail to point out that it has matured nicely since.)
I was, at the time, prohibited from working on open source projects (similar to Ben Tilly) without prefacing it by an administrative and legal process that usually took 3-6 months to complete.
I needed working code within a week, with the flexibility to add full functionality on a more relaxed schedule.
So my home-spun CSV module was born.
I coded it as precisely to the specifications I could find, drawing primarily from its Wiki page, and probably also RFC 4180. As is frequently noted and quoted around the Internet, the CSV standard is not perfectly well-defined -- although in my research it became clear that more of it was sufficiently-defined than that for which most give it credit.
I would also point out that, without surprise, Microsoft failed to adhere to one or two items that actually were in the CSV specification in its exports from Excel (whatever version we were using then), which required a few extra edge cases to be written into the module. I think at one point I was even down to considering a user-specified "Microsoft Flag" parameter to direct the parser to either follow the CSV standard or to use what worked with Microsoft Excel; not sure if I found an automated way to handle that corner case or not.
Anyway -- I have on two occasions run into CSV files it did not properly parse, and I have bugs registered in my change control system to address them someday. Alas, it is medium-low on my priority scheme, and has not seen any attention since June of 2008.
Plus, with Tux having written a brilliant alternative Text::CSV_XS module which, knowing Tux, probably was at least as picky about sticking to the specifications as I would have been, my motivation for fixing my own module is pretty low -- the next time I need to decode a CSV file and my own module doesn't handle it, I just might refactor to use Text::CSV_XS.
Anyway, if you could show the code you currently use to parse and/or build SCSVs, someone here might be able to find a quick way to flex it up so it can switch its delimiter without a lot of effort.
| [reply] [d/l] [select] |
Re: Semicolon delimited to Comma delimited
by CountZero (Bishop) on Apr 23, 2015 at 15:10 UTC
|
If I understand you well, your input is a .csv file and your script takes this .csv file to fill an Excel spreadsheet. Due to the fact that .csv files are badly defined and can use any odd character as a delimiter, your Indian colleagues (whose Excel program expects a different delimiter) cannot use your script.The solution is fairly simple. Use Text::CSV to transform the ';' delimiter to ',' before you feed the .csv file to your script. Text::CSV makes it extremely simple to do. Tux already explained to you how to do it. Your Indian colleagues will have to run this "transformation" script once on each "European" .csv file they get, but that is only a small effort.
CountZero A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James My blog: Imperial Deltronics
| [reply] |
|
#Get system language id to know German/US format of excels
Win32::API->Import('kernel32.dll', 'long GetSystemDefaultLangID()') or
+ die "Can't import GetSystemDefaultLangID: $^E\n";
$langid = GetSystemDefaultLangID() & 0xFFFF or die "ERROR: LANGID Retu
+rned <undef>\n"; # Mask out the garbage in high-order bytes
#my $langid = GetSystemDefaultLangID();
warn "my language id is :" .$langid;
This could be used at the start of a Perl script to identify if the list separator that comes along with a *.csv is going to be a semi-colon or comma! I do hope this comes in handy to anyone who faces this problem in the future!
Thank you fellow monks for your delightful replies!! I shall post one final comment if this idea works out. If not (i do hope it would), guess the discussion is still open for ideas! | [reply] [d/l] |
|
Have you considered using one of Excels native file formats instead of fighting with CSV? Spreadsheet::WriteExcel creates the old binary Excel 97 format (*.xls), Excel::Writer::XLSX creates the new zipped XML format (*.xlsx). You should use the new format unless you have to use Excel versions from the last century.
Alexander
--
Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)
| [reply] [d/l] [select] |
|
| [reply] [d/l] [select] |
Re: Semicolon delimited to Comma delimited
by igelkott (Priest) on Apr 23, 2015 at 23:35 UTC
|
Add a line with sep=; at the top of the csv file. Excel will read this as an instruction and override the end user's default separator. This should still be readable on your end as well but it's quite possibly specific to Excel.
Minor point but the semicolon variant isn't used everywhere in Europe but I did see it when working with a German company.
| [reply] [d/l] |
|
| [reply] [d/l] |
|
Wow, that's a great response time!
Perhaps obvious but may wish to support sep=, as well, for documents going the other way (and it's nice symmetry). Excel recognizes at least these two but not sure if it'll take anything else. (I guess it'd be a bit silly to make it too general.)
| [reply] [d/l] |
|
|
My respect for you just keeps climbing, Tux. Bravo.
| [reply] |
|
|
| [reply] |
Re: Semicolon delimited to Comma delimited
by marinersk (Priest) on Apr 23, 2015 at 12:20 UTC
|
ww lists your technical options nicely; either use splitand join and prepare to reinvent the wheel for handling edge cases, or look into mucking about with Perl's internal handlers.
That said, this is the first time I've ever heard of using semi-colons instead of commas in a Comma-Separated Values (CSV) file (.csv).
I have, of course, heard of using tabs in a Tab-Separated Values (TSV) file (.tsv).
The pedantic engineer in me is curious why it isn't called an SDV SSV or SCDV SCSV file (Semi-Colon-Delimited-Separated Values File), but I suppose the answer is that when people start tossing terms around, the original context is frequently lost, and whoever made the decision to change the standard either was ignorant of what CSV actually meant, or worked at Microsoft.
| [reply] [d/l] [select] |
|
| [reply] |
|
I empathize; despite what could have been a small rantlette above, I get it.
I am curious, though -- there are many reasons why some people avoid CPAN Modules. If you don't mind my asking, what's yours?
I, for example, would prefer to use them -- but I had an astoundingly lengthy run (near twenty-ish years) of bad luck where either the module didn't exist, was broken, could not be installed, or was adminstratively prohibited.
As a result, I've built up a sizeable collection of home-spun Perl Modules which do most of the work I need.
I've watched the CPAN community evolve very nicely -- potentially the single most respectable collection of collaborate works I've ever seen -- and I sit here and plod along with stuff which I know is either substandard or not as robust as it could be. I know how to use it and it does what I need, and so now it would take more time to refactor into the use of CPAN Modules.
As much as I'd like to, getting heavily into CPAN Modules is not in the cards for me. There's not much time left in my career, and I'm mostly just maintaining stuff written by others long gone while it waits to be replaced by what are perceived to be more modern tools.
LOL -- I've become my own thirty-ton gorilla.
So I'm curious -- what's the cause of your reticence to use external Modules?
| [reply] |
|
|
|