Re^2: Columnwise parsing of a file

Replies are listed 'Best First'.
Re^3: Columnwise parsing of a file by BrowserUk (Patriarch) on Feb 26, 2013 at 09:58 UTC
But what if the data file is having the following content If you have fields with embedded spaces, separated by spaces, and no quoting, you're stuffed. Are you producing this file or getting it from someone else? Are you sure that the fields are separated by spaces and not tabs? With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday' Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice.	[reply]
Re^3: Columnwise parsing of a file by Tux (Canon) on Feb 26, 2013 at 10:00 UTC
If your data is formatted that liberal, you are completely on your own. There ought to be rules for determining where fields/columns start and end. If there are no rules, you cannot parse. Period. Is the current "format" the only possible format? Can the "data" be generated as something that does have rules, like CSV? When the data is well-formatted CSV, you can use Text::CSV_XS to parse the data and use all advice already given, or even easier, use Spreadsheet::Read (in combination with Text::CSV_XS) to get direct access to every "cell" in your dataset. Enjoy, Have FUN! H.Merijn p	[reply]
Re^3: Columnwise parsing of a file by tmharish (Friar) on Feb 26, 2013 at 09:56 UTC
In the absence of field delimiters you need to make use of the structure of your data. In this case is the Product ID always an integer? Is the cost always a single word ( `[\w\d]+` )? and is the Product type a single word? Without some such pattern, a general solution, may not be possible.	[reply] [d/l]
Re^3: Columnwise parsing of a file by topher (Scribe) on Feb 26, 2013 at 16:15 UTC
How in this case I would handle the spaces each cell (eg: col 0 row 1 etc) content is having. If you are using spaces as field delimiters, and you have (non-escaped) spaces in your data, you don't handle it. You're asking the wrong question and trying to solve the wrong problem. The problem isn't how to parse the data, it's how to get valid data. Data in a format that can't be cleanly parsed is what we usually call garbage data. A well known maxim in the Database world (and elsewhere in IT), is "Garbage in, Garbage out". If you can't provide good data to process, or come up with a way to clean up your data before processing, you will never valid, reliable, trustworthy results out. Additionally, once you find a way to either get clean data or properly clean up your data, the parsing will likely be much simpler to figure out. Christopher Cashell	[reply]


No such thing as a small change
	PerlMonks