Beefy Boxes and Bandwidth Generously Provided by pair Networks
Do you know where your variables are?
 
PerlMonks  

Semicolon delimited to Comma delimited

by swatzz (Novice)
on Apr 23, 2015 at 11:08 UTC ( [id://1124380]=perlquestion: print w/replies, xml ) Need Help??

swatzz has asked for the wisdom of the Perl Monks concerning the following question:

Hello wise ones!

I have a question regarding *.csv files. As you are all aware, the European *.csv files use the 'semicolon' and the US or India uses 'comma' delimited file. So basically, i have a Perl script with my own set of modules to run it. I use the European format but it has to be compatible for my end users in India as well. In VBA, i could add for example a macro like this to change the format:

Private Sub loadFile(measurementFile As String) Dim query As QueryTable mSheet.Cells.Clear If (FileOrDirExists(measurementFile)) Then Set query = mSheet.QueryTables.add( _ Connection:="TEXT;" + measurementFile, _ Destination:=mSheet.Cells(1, 1)) With query .TextFileParseType = xlDelimited .TextFileSpaceDelimiter = True .TextFileTabDelimiter = True .TextFileDecimalSeparator = "," .Refresh End With End If mSheet.Columns.AutoFit End Sub

I was wondering if there is something similar in Perl using OLE. I do not wish to use any extra libraries like Text::CSV etc. Is there some way where i can define the 'List separtor' as a comma and the 'Decimal separator' as point!!

Any answers or help is much appreciated folks :) And oh yes! I use Win32::OLE and i was hoping there was some way to define the List separator (I noticed that Win32::OLE::Variant allows for decimal separator but i could not spot anything for List separator)

Replies are listed 'Best First'.
Re: Semicolon delimited to Comma delimited
by Tux (Canon) on Apr 23, 2015 at 12:29 UTC

    Disregarding the Microsoft issues with localized list-separator characters, as described here, and given that you already have exported to CSV, be it with another separator than the required ,, the solution is extremely simple, using Text::CSV_XS:

    $ cat bad.csv
    a;b;c
    1;2;3
    ;4;
    "Føó";"Hēłŀõ Ẇőŕļđ";
    $ perl -C3 -MText::CSV_XS=csv -we'csv (in => "bad.csv", sep => ";")'
    a,b,c
    1,2,3
    ,4,
    Føó,"Hēłŀõ Ẇőŕļđ",
    

    Enjoy, Have FUN! H.Merijn

      Sorry forgot to point out or rather make clear that i am extracting data from *.csv and writing to Excel. And there is this one sheet in Excel which completely copy pastes the contents of *.csv file to Excel. I just call it 'raw data'. This part does not work at all with my Indian counterpart because of this weird 'semi-colon - comma' issue :(

        Interesting; I'd missed this when I first read the thread.. I've found CSV to be unreliable as a copy/paste source, and generally use TSV which is astoundingly reliable in Excel.

        Doesn't solve your problem, of course; you still need to know which semicolons to convert to tabs or commas and so this hasn't helped you at all -- just a note from a fairly heavy Excel user, in case it helps you in the future.

Re: Semicolon delimited to Comma delimited
by ww (Archbishop) on Apr 23, 2015 at 11:44 UTC

    Not directly re Win32::OLE (the use of which seems to me [YMMV] "the hard way to achieve your results"), but have you read perldoc perlvar (from your CLI)? It describes the List Separator.

    You can read about the Decimal Separator in perldoc perldata.

    Converting commas to semicolons (and vice versa) is a trivial job with regexen; making sure that you don't convert non-delimiting instances of either is something your can read about in the results of an appropriate Super Search.

    As always, the local value system recommends trying to find answers on your own rather than a posting a question in the form of yours... and rejecting use of well-established (tried-and-true) modules is not likely to enhance the response.

    Update: Forgot to mention: Super_Search can also find nodes|threads like Thousand and decimal separators with Spreadsheet::WriteExcel.

      Thank you for your kind words! I would not be so dumb as to not thoroughly search before i post a question(of course there is always the 2% probability of missing out on some hint or answer given somewhere). If i were looking for Spreadsheet::WriteExcel, I would not post the question either. But if i were to change all my 10 modules written using/for OLE to Spreadsheet just because i could not figure out how to turn a semi colon to comma, well that would be a bummer. Even Google could not help me with this question except for that i could use Text::CSV. I am not completely rejecting the idea of using a new package but if there were a way to do it without using the package it would be betetr for me. Thousand and decimal separators is not a prio for me more than turning a semicolon to comma! I was told by a wise monk not to try split for *.csv files.

        The solution really boils down to a choice between:

        1. use Text::CSV;
        2. Rewrite Text::CSV in your own words, and then slowly rediscover and re-solve all the unexpected tricky bits while providing support for who knows how long.

        Actually, despite my previous comment about split / join, when I wrote my own module to handle CSV, I stepped back a generation and went for character-by-character parsing, and the use of state flags to track things like quote encapsulation and line completion.

        The Text::CSV module which existed at the time was grossly underpowered and did not work on some of the simplest CSV files I'd exported from Excel. (I would be remiss to fail to point out that it has matured nicely since.)

        I was, at the time, prohibited from working on open source projects (similar to Ben Tilly) without prefacing it by an administrative and legal process that usually took 3-6 months to complete.

        I needed working code within a week, with the flexibility to add full functionality on a more relaxed schedule.

        So my home-spun CSV module was born.

        I coded it as precisely to the specifications I could find, drawing primarily from its Wiki page, and probably also RFC 4180. As is frequently noted and quoted around the Internet, the CSV standard is not perfectly well-defined -- although in my research it became clear that more of it was sufficiently-defined than that for which most give it credit.

        I would also point out that, without surprise, Microsoft failed to adhere to one or two items that actually were in the CSV specification in its exports from Excel (whatever version we were using then), which required a few extra edge cases to be written into the module. I think at one point I was even down to considering a user-specified "Microsoft Flag" parameter to direct the parser to either follow the CSV standard or to use what worked with Microsoft Excel; not sure if I found an automated way to handle that corner case or not.

        Anyway -- I have on two occasions run into CSV files it did not properly parse, and I have bugs registered in my change control system to address them someday. Alas, it is medium-low on my priority scheme, and has not seen any attention since June of 2008.

        Plus, with Tux having written a brilliant alternative Text::CSV_XS module which, knowing Tux, probably was at least as picky about sticking to the specifications as I would have been, my motivation for fixing my own module is pretty low -- the next time I need to decode a CSV file and my own module doesn't handle it, I just might refactor to use Text::CSV_XS.

        Anyway, if you could show the code you currently use to parse and/or build SCSVs, someone here might be able to find a quick way to flex it up so it can switch its delimiter without a lot of effort.

Re: Semicolon delimited to Comma delimited
by CountZero (Bishop) on Apr 23, 2015 at 15:10 UTC
    If I understand you well, your input is a .csv file and your script takes this .csv file to fill an Excel spreadsheet. Due to the fact that .csv files are badly defined and can use any odd character as a delimiter, your Indian colleagues (whose Excel program expects a different delimiter) cannot use your script.

    The solution is fairly simple. Use Text::CSV to transform the ';' delimiter to ',' before you feed the .csv file to your script. Text::CSV makes it extremely simple to do. Tux already explained to you how to do it. Your Indian colleagues will have to run this "transformation" script once on each "European" .csv file they get, but that is only a small effort.

    CountZero

    A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James

    My blog: Imperial Deltronics
      Hey CountZero!

      Thanks! I already did think of that solution but i must also mention it was my last choice! You can see why in my reply to 'marinersk'. But someone gave me a nice idea which could work(i have not tested this yet but could definitely be a light in the dark tunnel of CSV/Excel features!)

      #Get system language id to know German/US format of excels Win32::API->Import('kernel32.dll', 'long GetSystemDefaultLangID()') or + die "Can't import GetSystemDefaultLangID: $^E\n"; $langid = GetSystemDefaultLangID() & 0xFFFF or die "ERROR: LANGID Retu +rned <undef>\n"; # Mask out the garbage in high-order bytes #my $langid = GetSystemDefaultLangID(); warn "my language id is :" .$langid;

      This could be used at the start of a Perl script to identify if the list separator that comes along with a *.csv is going to be a semi-colon or comma! I do hope this comes in handy to anyone who faces this problem in the future!

      Thank you fellow monks for your delightful replies!! I shall post one final comment if this idea works out. If not (i do hope it would), guess the discussion is still open for ideas!

        Have you considered using one of Excels native file formats instead of fighting with CSV? Spreadsheet::WriteExcel creates the old binary Excel 97 format (*.xls), Excel::Writer::XLSX creates the new zipped XML format (*.xlsx). You should use the new format unless you have to use Excel versions from the last century.

        Alexander

        --
        Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)
        #Get system language id to know German/US format of excels Win32::API->Import('kernel32.dll', 'long GetSystemDefaultLangID()') or + die "Can't import GetSystemDefaultLangID: $^E\n"; $langid = GetSystemDefaultLangID() & 0xFFFF or die "ERROR: LANGID Retu +rned <undef>\n"; # Mask out the garbage in high-order bytes

        Nice idea, but it is not going to work reliably. Quoting the documentation of Text::CSV_XS (by Tux):

        The import/export from Microsoft Excel is a risky task, according to the documentation in Text::CSV::Separator. Microsoft uses the system's list separator defined in the regional settings, which happens to be a semicolon for Dutch, German and Spanish (and probably some others as well). For the English locale, the default is a comma. In Windows however, the user is free to choose a predefined locale, and then change every individual setting in it, so checking the locale is no solution.

        Alexander

        --
        Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)
Re: Semicolon delimited to Comma delimited
by igelkott (Priest) on Apr 23, 2015 at 23:35 UTC

    Add a line with sep=; at the top of the csv file. Excel will read this as an instruction and override the end user's default separator. This should still be readable on your end as well but it's quite possibly specific to Excel.

    Minor point but the semicolon variant isn't used everywhere in Europe but I did see it when working with a German company.

      Support for a single first line with just sep=; has just been added to Text::CSV_XS. It will be featured in the upcoming 1.17 release.


      Enjoy, Have FUN! H.Merijn

        Wow, that's a great response time!

        Perhaps obvious but may wish to support sep=, as well, for documents going the other way (and it's nice symmetry). Excel recognizes at least these two but not sure if it'll take anything else. (I guess it'd be a bit silly to make it too general.)

        My respect for you just keeps climbing, Tux. Bravo.

      OMG I am SO putting this in my back pocket for future use.

      Thank you!

Re: Semicolon delimited to Comma delimited
by marinersk (Priest) on Apr 23, 2015 at 12:20 UTC

    ww lists your technical options nicely; either use splitand join and prepare to reinvent the wheel for handling edge cases, or look into mucking about with Perl's internal handlers.

    That said, this is the first time I've ever heard of using semi-colons instead of commas in a Comma-Separated Values (CSV) file (.csv).

    I have, of course, heard of using tabs in a Tab-Separated Values (TSV) file (.tsv).

    The pedantic engineer in me is curious why it isn't called an SDV SSV or SCDV SCSV file (Semi-Colon-Delimited-Separated Values File), but I suppose the answer is that when people start tossing terms around, the original context is frequently lost, and whoever made the decision to change the standard either was ignorant of what CSV actually meant, or worked at Microsoft.

      Yea well that's the European format of *.csv files. My script was to extract data from a *.csv file and write into Excel(with multiple data in multiple sheets). It worked fine in my system but did not run with my counterpart in India. And then i figured out that the European system uses the semi-colon instead of comma in *.csv files :(

        I empathize; despite what could have been a small rantlette above, I get it.

        I am curious, though -- there are many reasons why some people avoid CPAN Modules. If you don't mind my asking, what's yours?

        I, for example, would prefer to use them -- but I had an astoundingly lengthy run (near twenty-ish years) of bad luck where either the module didn't exist, was broken, could not be installed, or was adminstratively prohibited.

        As a result, I've built up a sizeable collection of home-spun Perl Modules which do most of the work I need.

        I've watched the CPAN community evolve very nicely -- potentially the single most respectable collection of collaborate works I've ever seen -- and I sit here and plod along with stuff which I know is either substandard or not as robust as it could be. I know how to use it and it does what I need, and so now it would take more time to refactor into the use of CPAN Modules.

        As much as I'd like to, getting heavily into CPAN Modules is not in the cards for me. There's not much time left in my career, and I'm mostly just maintaining stuff written by others long gone while it waits to be replaced by what are perceived to be more modern tools.

        LOL -- I've become my own thirty-ton gorilla.

        So I'm curious -- what's the cause of your reticence to use external Modules?

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1124380]
Approved by hdb
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others musing on the Monastery: (3)
As of 2024-04-26 07:12 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found