dana has asked for the wisdom of the Perl Monks concerning the following question:


This may be fairly easy but I am not seeing the solution.

I'm parsing and viewing data from a comma delimited file using the following (subset of total program):

my @data = split(/,/, $line); foreach my $i ( 0..$#data ) { $dataPos{$data[$i]} = $i; print "$data[$i] -- > $i\n"; } print Dumper ( \%dataPos );
The first print within the 'foreach' loop yields (please ignore the '' values -- I'll fix those soon:
WELL -- > 0
ENTRY ID -- > 1
CREATOR -- > 3
SAMPLE ID -- > 5
RESULT -- > 10
QUALITY -- > 11
EDITED -- > 12
-- > 13
RESULT -- > 16
QUALITY -- > 17
EDITED -- > 18
-- > 19
-- > 20

while a 'Dumper' print out of the dataPos hash yields:
$VAR1 = {
'RESULT' => 16,
'CREATOR' => 3,
' ' => 20,
'WELL' => 0,
'SAMPLE ID' => 5,
'QUALITY' => 17,
'ENTRY ID' => 1,
'EDITED' => 18,

Could someone please tell me what happened to values 9 - 13 in the hash and when these two are not equivalent?

Thank you!!

Replies are listed 'Best First'.
Re: hash and array mismatch
by BrowserUk (Patriarch) on May 08, 2007 at 02:05 UTC

    A hash cannot have duplicate keys, whereas an array can. Hence, the key POSITION NAME is given a value of 9. but then 6 lines later on, the value associated with that key gets overwritten with the value 15. And the key RESULT is initially set to the value 10, but the later overwritten with the value 16. And so on for all the other missing values.

    What you can do about it depends upon what you want to do with the hash afterwards. It maybe as simple as reversing the keys and values. The line number (position) will always be unique, so you could use that as the key and the text as the value, but that makes it little more useful than the array.

    Perhaps what you want is

    foreach my $i ( 0..$#data ) { push @{ $dataPos{ $data[$i] } }, $i; print "$data[$i] -- > $i\n"; }

    Which will accumulate an array of positions associated with each value. It would dump something like

    $VAR1 = { ' TOTAL QUALITY' => [ 8, 14 ], 'RESULT' => [ 10, 16 ], 'CREATOR' => [ 3 ], ' ' => [ 13, 19, 20 ], 'TOTAL QUALITY' => [ 8 ], 'WELL' => [ 0 ], 'SAMPLE ID' => [ 5 ], 'DISPENSATION ORDER' => [ 2 ], 'POSITION NAME' => [ 9, 15 ], 'QUALITY' => [ 11, 17 ], 'DESCRIPTION' => [ 6 ], 'ENTRY ID' => [ 1 ], 'CREATION DATE' => [ 4 ], 'EDITED' => [ 12, 18 ], 'RUN ID AND DATE' => [ 7 ] };

    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.

      Thank you so very much.

      As said, I am parsing a data file and was taking it for granted that the headers were unique, but apparently there are multiple columns with the same name.

      Thank you again.

Re: hash and array mismatch
by GrandFather (Saint) on May 08, 2007 at 02:07 UTC

    You are using the same key for multiple values. You will only retain the last value assigned to the hash for a given key. For example 'TOTAL QUALITY' has values 8 and 14, only 14 is retained.

    Most likely a hash is not appropriate for whatever you are trying to achieve. You could however:

    use warnings; use strict; use Data::Dump::Streamer; my @data = <DATA>; my %dataPos; chomp @data; push @{$dataPos{$data[$_]}}, $_ for 0 .. $#data; Dump (\%dataPos); __DATA__ WELL TOTAL QUALITY POSITION NAME TOTAL QUALITY POSITION NAME


    $HASH1 = { "" => [ 3 ], "POSITION NAME" => [ 2, 5 ], "TOTAL QUALITY" => [ 1, 4 ], WELL => [ 0 ] };

    Which creates a list of positions for each key value.

    DWIM is Perl's answer to Gödel

      Very interesting!

      At first glance your code didn't look right. So I read more carefully and saw what was bothering me:

      push @{$dataPos{$data[$_]}}, $_ for 0 .. $#data;

      You are pushing data onto an undefined value that is dereferenced as an array! The script should (I thought) die the first time through the for loop. But it works. So I did some experimentats and read the docs.

      I found out that you can do this:

      my $foo; # $foo is undefined push @$foo, 23; print "I found @{$foo}\n";

      Further testing showed that:

      push @{ undef() }, 99;
      Died as I had expected.

      Further tests show that unshift works the same way. Also, pop and shift will create array references.

      my $foo = undef; shift @{$foo}; print "Foo: $foo\n"; # prints Foo: ARRAY(0x12121212)

      The various docs don't mention this special behavior. Thanks for the education.

      TGI says moo

        It is called "autovivification". Have a read through perlreftut and perlref.

        You have seen it happen in a light weight way whenever you assign a value to an array element that hadn't been used before. It is a little more interesting in the case of a reference which you are using for the first time, but is one of the underpinnings of creating and managing "interesting" data structures in Perl, as indeed was the case here.

        My pleasure, education is a large part of what PerlMonks is about.

        DWIM is Perl's answer to Gödel
Re: hash and array mismatch
by TOD (Friar) on May 08, 2007 at 02:07 UTC
    that's fairly easy to answer: the hash keys are identical. you might work around this bug with something like the following:
    my %keys = keys %data; my $hkey = $data{$i}; if ((my @found = grep /^$hkey$/, %keys) > 1) { $hkey .= '_'; } $dataPos{$hkey} = $i;
    masses are the opiate for religion.

      I assume your snippet is intended to go inside the OP's for loop:

      foreach my $i ( 0..$#data ) { my %keys = keys %dataPos; # Changed hash name to match hash OP is +building my $hkey = $data[$i]; # Changed data lookup to array to match O +P code. if ((my @found = grep /^$hkey$/, %keys) > 1) { $hkey .= '_'; } $dataPos{$hkey} = $i; print "$data[$i] -- > $i\n"; }

      After I tweaked your code to work with the OP's code, there are still problems. You have a unnecessary nested loop that will grow with each pass through the outer loop. Also, if a key shows up three or more times, the last value will overwrite the value stored for the second key.

      I've fixed these issues below.

      foreach my $i ( 0..$#data ) { my $key = $data[i]; # append '_' until no matching key exists. while ( exists $dataPos{$key} ) { $key .= '_'; } $dataPos{$key} = $i; print "$key -- > $i\n"; }

      Even after its been fixed, I still wouldn't use this approach. I'd store repeated values in arrays as other posters have suggested.

      TGI says moo

Re: hash and array mismatch
by Moron (Curate) on May 08, 2007 at 09:56 UTC
    Although its clear that the lack of chop or chomp is putting incorrect (duplicate) garbage values in your hash, the point is that if you are going to address the "", this creates a fair chance that the problem will go away at that point anyway -- see Text::CSV::Simple for an example solution where it definitely will!

    So I'd quickly move on to the quotes issue and don't spend time on this presented issue anyway -- it might never happen!


    ^M Free your mind!

Re: hash and array mismatch
by jhourcle (Prior) on May 08, 2007 at 14:15 UTC

    Lots of other people have already explained what the problem is ... however, in this particular case, it looks as if the data is denormalized, as there are two instances of the columns from 'TOTAL QUALITY' to 'EDITED'.

    It looks to me as if the occurance of a blank header is a sign that the next grouping starts, but I can't be sure with only two groups.

    From the looks of things, you're looking at records from a health department that tracks well water, and you have two tests per well. Unfortunately, without knowing what you're trying to do with this data, I can't really make any recomendation on how best to deal with it in a meaninful way.

Re: hash and array mismatch
by Anonymous Monk on May 09, 2007 at 19:09 UTC
    The keys corresponding to 9-13 were repeated later in your $line, which resulted in an overwrite of the values corresponding to those keys.

    For example, from the first set of print statements, position 9 corresponds to "POSITION NAME", which would look something like:

    $dataPos{"POSITION NAME"} = 9;
    A little further in your list you'll see "POSITION NAME" at position fifteen; the assignment statement for this would look something like
    $dataPos{"POSITION NAME"} = 15;
    Since this statement is executed after the first, your 9 disappears, and only 15 is stored as the value for the key of "POSITION NAME".

    Repeat the above for entries 10-13, and you should see those values got overwritten.

    If you want to keep all this data, you might consider making the integers your keys, and saving the text as the values. Of course, you already did that when you created the @data array.