Beefy Boxes and Bandwidth Generously Provided by pair Networks
The stupid question is the question not asked
 
PerlMonks  

Re: Help parsing a complicated csv

by linuxer (Curate)
on Apr 25, 2011 at 19:54 UTC ( [id://901244]=note: print w/replies, xml ) Need Help??


in reply to Help parsing a complicated csv

Well, you could have checked the format of your text, because it propably doesn't look like you intended.

If you would have used <c></c>-tags around your sample data, the format would be visible.

Directly below the form fields, where you compose your questions, are some text and several links, which advise you how to mark up your question, code and data.

With code-Tags your examples could look like this:

your 1st example:

heading 1, heading 2, heading 3 data, data, data data, data, data

your 2nd example:

heading 1, heading 2 data, data data, heading 3 data heading 4, heading 5 data, data

Replies are listed 'Best First'.
Re^2: Help parsing a complicated csv
by Tux (Canon) on Apr 26, 2011 at 05:58 UTC

    I didn't even spot the change in field numbers in the OP! If that is a real criterium:

    my $csv = Text::CSV_XS->new ({ binary => 1, auto_diag => 1}); my @hdr; while (my $row = $csv->getline ($fh)) { # example: if ( # a change in number of columns @hdr != @$row or # first column matches header criterium $row->[0] =~ m/^[A-Z]/ ) { @hdr = @{$row}; next; } # just an example my %hash; @hash{@hdr} = @$row; }

    Enjoy, Have FUN! H.Merijn
      It worked! I think that I should be able to get the rest of what I need from here. Thanks a ton!!

      I may have spoke to soon. It appears that this is pulling my data and structuring it correctly when I output to a txt file.

      However I need to be able to pull the specific columns and im not sure how to do that. I want to be able to do something like:

      print hash->{header_name} and it will give me all of the keys under that column.

      So again, I have columns in a csv file that are kind of "stacked" on top of each other, meaning there is not a single header row at the top of the file, the header for each column is on different lines of the file.

      The headers are always enclosed in <>, so I want to scan through my csv pull out the headers and then put the corresponding values in that particular column into a hash that I can read out by doing something like hash->{header} this would give me all of the values in the column.

      Sometimes there are 100 rows under a header and sometimes there is just 1. Thanks again for your help, sorry if this doesnt make sense, it is kind of confusing.

      Let me try to explain again what the csv looks like...

      <header>, <header>, <header> value, value, value value, value value, value, <header>, <header> value, value value, value value,

      This goes on like this randomly through out the csv file, I hope that this little picture makes some more sense. I think that we are on the right track but not here yet! Thanks again for everyones help!

        Hi,

        I just read this thread again and saw your reply.

        Assuming, that an empty string is not a valid value, I came up with this:

        #! /usr/bin/perl use strict; use warnings; use Text::CSV_XS; my $csv = Text::CSV_XS->new({ binary => 1, allow_whitespace => 1, }) or die "Cannot use CSV: " . Text::CSV_XS->error_diag(); # for testing; in real world, open file and use that handle my $fh = \*DATA; my (%hash, @hdr); while ( my $row = $csv->getline( $fh ) ) { # header not yet defined? or 1st cell starts with '<' ==> use row +as header if ( !@hdr || $row->[0] =~ m/^</ ) { @hdr = @{$row}; next; } # otherwise try to process data else { for my $i ( 0 .. $#hdr ) { # only add those values which contain at least one charact +er # so: no "undef"s or empty strings in result # if empty strings are OK or wanted, try to replace length +() with defined() push @{ $hash{$hdr[$i]} }, ( length $row->[$i] ? $row->[$i +] : () ); } } } # check created data structure require Data::Dumper; $Data::Dumper::Sortkeys = 1; print Data::Dumper::Dumper( \%hash ); __DATA__ <A1>, <A2>, <A3> a1, aa1, aaa1 a2, , aaa2 a3, aa3 a4 <B1>, <B2> b1, bb1 b2, bb2 b3
        That produced a result like this:
        $VAR1 = { '<A1>' => [ 'a1', 'a2', 'a3', 'a4' ], '<A2>' => [ 'aa1', 'aa3' ], '<A3>' => [ 'aaa1', 'aaa2' ], '<B1>' => [ 'b1', 'b2', 'b3' ], '<B2>' => [ 'bb1', 'bb2' ] };

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://901244]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others romping around the Monastery: (10)
As of 2024-04-18 08:44 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found