build hash from csv file

This function builds and returns a hash when passed a filename and the number of fields. Rudimentary best guess on the number of fields is implemented, but in reality just takes the first line and uses the number of fields in it to compare against the rest. It would be called as such my %hash = build_hash("somefile.csv", "10");

This is from a work in progress which takes a csv file pulls out the fields specified by the user and drops them in another file. This other file is processed by mailing software. Once the mailing software is finished it spits out a csv file and the script puts it all back together again with the new data.

#############
sub build_hash
#############
{
    my ( $file_, $numfields_ ) = @_;
    my $line                   = 0;
    my ( %hash, $cvsfile, $errorfile );

    open $cvsfile, $file_ or
        confess "Unable to open $file_\n";
    open $errorfile, ">", "${file_}\.err" or
        confess "Unable to open ${file_}\.err\n";

    for (<$cvsfile>)
    {
        chomp($_);
        s/"//g;

        $line++;

        my (@linedata) = split /,/, $_;
  
        # Make a best guess (using line 1) 
        if ( $line == 1 and ! defined ( $numfields_ ) )
        {
            $numfields_ = scalar(@linedata)
        }
        
        unless ( scalar(@linedata) == $numfields_ )
        {
            print $errorfile "$_\n";
            next;
        }

    my $fieldnum = 0;

    for my $info (@linedata)
    {
        $fieldnum++;
        $hash{$line}{$fieldnum} = $info;
    }
    }
    close ($cvsfile);
    close ($errorfile);
    return %hash or confess "unable to return\n";
}
[download]

Comment on build hash from csv file Select or Download Code

Replies are listed 'Best First'.
Re: build hash from csv file by davorg (Chancellor) on Sep 19, 2005 at 13:21 UTC
Hope you don't mind a few comments on your code. Firstly, just spliting on commas and removing double quotes is only going to work on the most basic CSV files. Much better to look at using Text::ParseWords (which comes with Perl) or Text::CSV_XS (which doesn't). Both of these will handle more complex CSV files than your code does. Secondly, it seems a bit wasteful to loop round your @linedata array copying each element separately into a second-level hash. If fact I'd question the use of a hash there at all. If you're using a hash whose keys are low numbered integers, then you're much better off using an array. And as you've already got an array, you can just store a reference to that in your data structure. `$hash{$lineno} = \@linedata;` [download] Hope this is useful. -- <http://dave.org.uk> "The first rule of Perl club is you do not talk about Perl club." -- Chip Salzenberg	[reply] [d/l]
Re^2: build hash from csv file by tcf03 (Deacon) on Sep 19, 2005 at 13:24 UTC
Don't mind at all. I was looking at the Text::CSV::Simple - Ill look at the others now though - I was just a bit perplexed as to how to merge all the data back together from different files with unknown fields. Thanks for your input. Ted -- "That which we persist in doing becomes easier, not that the task itself has become easier, but that our ability to perform it has improved." --Ralph Waldo Emerson	[reply]
Re: build hash from csv file by dragonchild (Archbishop) on Sep 19, 2005 at 13:29 UTC
First off, the hash you're building is ... wonky. Why not just use an array of arrays instead of a hash of hashes? All your keys are numbers ... that indicates you really want an array, not a hash. Second, unless your CSV format is vastly different than the ones I'm used to, your `s/"//g;` will LOSE information, badly. Like, it'd be trivial to make a correct CSV file that would make your code break. Much better is to let a CPAN module do this for you. I like tilly's Text::xSV best, but Text::CSV_XS is perfectly acceptable. (Text::CSV isn't feature-complete.) `use Text::xSV; sub build_hash { my ($file_) = @_; my $reader = Text::xSV->new(); $reader->open_file( $file_ ); my @result; while ( my $row = $reader->get_row() ) { push @result, $row; } return @result; }` [download] If you absolutely have to have a hash of hashes, then you could do something like: `use Text::xSV; sub build_hash { my ($file_) = @_; my $reader = Text::xSV->new(); $reader->open_file( $file_ ); my %result; my $line = 0; while ( my $row = $reader->get_row() ) { # @{ $result{ ++$line } }{ 1 .. scalar(@$row) } = @$row; $line++; for my $i ( 1 .. @$row ) { $result{$line}{$i} = $row->[ $i-1 ] } } return %result; }` [download] That commented-out wonky line is a hash-slice. It's exactly equivalent (but faster) than the 4 lines below. My criteria for good software: Does it work? Can someone else come in, make a change, and be reasonably certain no bugs were introduced?	[reply] [d/l] [select]


P is for Practical
	PerlMonks