Beefy Boxes and Bandwidth Generously Provided by pair Networks
No such thing as a small change
 
PerlMonks  

Re^2: Columnwise parsing of a file

by ghosh123 (Monk)
on Feb 26, 2013 at 09:51 UTC ( [id://1020645]=note: print w/replies, xml ) Need Help??


in reply to Re: Columnwise parsing of a file
in thread Columnwise parsing of a file

Hi

Thanks for replying ! It surely helps.
But what if the data file is having the following content

Product ID Product Name Product Type Cost
1 TV set Entertainment 10k
How in this case I would handle the spaces each cell (eg: col 0 row 1 etc) content is having.

Please notice the space in cell name 'Product ID', it is not 'ProductID'. Also on the other hand only 'Cost' is another cell name with no space in between .

Here col 2 row 1 should give me : Product Name
and
col 2 row 2 should give : TV set

Replies are listed 'Best First'.
Re^3: Columnwise parsing of a file
by BrowserUk (Patriarch) on Feb 26, 2013 at 09:58 UTC
    But what if the data file is having the following content

    If you have fields with embedded spaces, separated by spaces, and no quoting, you're stuffed.

    Are you producing this file or getting it from someone else?

    Are you sure that the fields are separated by spaces and not tabs?


    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
Re^3: Columnwise parsing of a file
by Tux (Canon) on Feb 26, 2013 at 10:00 UTC

    If your data is formatted that liberal, you are completely on your own. There ought to be rules for determining where fields/columns start and end. If there are no rules, you cannot parse. Period.

    Is the current "format" the only possible format? Can the "data" be generated as something that does have rules, like CSV? When the data is well-formatted CSV, you can use Text::CSV_XS to parse the data and use all advice already given, or even easier, use Spreadsheet::Read (in combination with Text::CSV_XS) to get direct access to every "cell" in your dataset.


    Enjoy, Have FUN! H.Merijn
    p
Re^3: Columnwise parsing of a file
by tmharish (Friar) on Feb 26, 2013 at 09:56 UTC

    In the absence of field delimiters you need to make use of the structure of your data.

    In this case is the Product ID always an integer? Is the cost always a single word ( [\w\d]+ )? and is the Product type a single word?

    Without some such pattern, a general solution, may not be possible.

Re^3: Columnwise parsing of a file
by topher (Scribe) on Feb 26, 2013 at 16:15 UTC
    How in this case I would handle the spaces each cell (eg: col 0 row 1 etc) content is having.

    If you are using spaces as field delimiters, and you have (non-escaped) spaces in your data, you don't handle it. You're asking the wrong question and trying to solve the wrong problem.

    The problem isn't how to parse the data, it's how to get valid data. Data in a format that can't be cleanly parsed is what we usually call garbage data.

    A well known maxim in the Database world (and elsewhere in IT), is "Garbage in, Garbage out". If you can't provide good data to process, or come up with a way to clean up your data before processing, you will never valid, reliable, trustworthy results out.

    Additionally, once you find a way to either get clean data or properly clean up your data, the parsing will likely be much simpler to figure out.

    Christopher Cashell

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1020645]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others chanting in the Monastery: (8)
As of 2024-04-25 08:02 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found