Beefy Boxes and Bandwidth Generously Provided by pair Networks
The stupid question is the question not asked
 
PerlMonks  

Columnwise parsing of a file

by ghosh123 (Monk)
on Feb 25, 2013 at 13:30 UTC ( [id://1020520]=perlquestion: print w/replies, xml ) Need Help??

ghosh123 has asked for the wisdom of the Perl Monks concerning the following question:

Hi,

How can I parse a file columnwise. Suppose I have file which has following content in a matrix like pattern :

20 30 40
60 70 80
90 100 49


We usually read a file line by line, so we eventually parse it rowwise. But I need to parse it columnwise so that I can randomly know which co-ordinate(say col2,row1)is holding what.
Thanks.

Replies are listed 'Best First'.
Re: Columnwise parsing of a file
by BrowserUk (Patriarch) on Feb 25, 2013 at 13:48 UTC
    I need to parse it columnwise so that I can randomly know which co-ordinate

    Parse it line-by-line (there is no alternative for variable length lines); split each line into fields and build a 2D array from the results.

    Now you can access everything in whatever order you like.


    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
Re: Columnwise parsing of a file
by Athanasius (Archbishop) on Feb 25, 2013 at 13:57 UTC

    BrowserUk got in ahead of me. Here is one way to implement his solution:

    #! perl use strict; use warnings; use Data::Dump; my @matrix; my $row = 0; push @{$matrix[$row++]}, split while <DATA>; dd @matrix; print 'element at col 3, row 2 is ', get_element(\@matrix, 3, 2), "\n" +; sub get_element { my ($matrix_ref, $col, $row) = @_; return $matrix_ref->[$row - 1][$col - 1]; } __DATA__ 20 30 40 60 70 80 90 100 49

    Output:

    23:50 >perl 548_SoPW.pl ([20, 30, 40], [60, 70, 80], [90, 100, 49]) element at col 3, row 2 is 80 23:50 >

    Update: A simpler syntax for populating the array:

    my @matrix; push @matrix, [ split ] while <DATA>;

    Hope that helps,

    Athanasius <°(((><contra mundum Iustus alius egestas vitae, eros Piratica,

      Hi

      Thanks for replying ! It surely helps.
      But what if the data file is having the following content

      Product ID Product Name Product Type Cost
      1 TV set Entertainment 10k
      How in this case I would handle the spaces each cell (eg: col 0 row 1 etc) content is having.

      Please notice the space in cell name 'Product ID', it is not 'ProductID'. Also on the other hand only 'Cost' is another cell name with no space in between .

      Here col 2 row 1 should give me : Product Name
      and
      col 2 row 2 should give : TV set

        But what if the data file is having the following content

        If you have fields with embedded spaces, separated by spaces, and no quoting, you're stuffed.

        Are you producing this file or getting it from someone else?

        Are you sure that the fields are separated by spaces and not tabs?


        With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.

        If your data is formatted that liberal, you are completely on your own. There ought to be rules for determining where fields/columns start and end. If there are no rules, you cannot parse. Period.

        Is the current "format" the only possible format? Can the "data" be generated as something that does have rules, like CSV? When the data is well-formatted CSV, you can use Text::CSV_XS to parse the data and use all advice already given, or even easier, use Spreadsheet::Read (in combination with Text::CSV_XS) to get direct access to every "cell" in your dataset.


        Enjoy, Have FUN! H.Merijn
        p

        In the absence of field delimiters you need to make use of the structure of your data.

        In this case is the Product ID always an integer? Is the cost always a single word ( [\w\d]+ )? and is the Product type a single word?

        Without some such pattern, a general solution, may not be possible.

        How in this case I would handle the spaces each cell (eg: col 0 row 1 etc) content is having.

        If you are using spaces as field delimiters, and you have (non-escaped) spaces in your data, you don't handle it. You're asking the wrong question and trying to solve the wrong problem.

        The problem isn't how to parse the data, it's how to get valid data. Data in a format that can't be cleanly parsed is what we usually call garbage data.

        A well known maxim in the Database world (and elsewhere in IT), is "Garbage in, Garbage out". If you can't provide good data to process, or come up with a way to clean up your data before processing, you will never valid, reliable, trustworthy results out.

        Additionally, once you find a way to either get clean data or properly clean up your data, the parsing will likely be much simpler to figure out.

        Christopher Cashell
Re: Columnwise parsing of a file
by mhearse (Chaplain) on Feb 25, 2013 at 22:07 UTC
    Let's operate under the assumption that you want to sum column 2:
    awk '{ sum+=$2} END {print sum}' < matrix.txt
    If you want/need to use Perl, and the data in your matrix has fixed length lines... you can unpack it

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1020520]
Approved by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others exploiting the Monastery: (3)
As of 2024-04-20 01:21 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found