dana has asked for the wisdom of the Perl Monks concerning the following question:
HI!!
This may be fairly easy but I am not seeing the solution.
I'm parsing and viewing data from a comma delimited file using the following (subset of total program):
my @data = split(/,/, $line);
foreach my $i ( 0..$#data ) {
$dataPos{$data[$i]} = $i;
print "$data[$i] -- > $i\n";
}
print Dumper ( \%dataPos );
The first print within the 'foreach' loop yields (please ignore the '' values -- I'll fix those soon:
WELL -- > 0
ENTRY ID -- > 1
DISPENSATION ORDER -- > 2
CREATOR -- > 3
CREATION DATE -- > 4
SAMPLE ID -- > 5
DESCRIPTION -- > 6
RUN ID AND DATE -- > 7
TOTAL QUALITY -- > 8
POSITION NAME -- > 9
RESULT -- > 10
QUALITY -- > 11
EDITED -- > 12
-- > 13
TOTAL QUALITY -- > 14
POSITION NAME -- > 15
RESULT -- > 16
QUALITY -- > 17
EDITED -- > 18
-- > 19
-- > 20
while a 'Dumper' print out of the dataPos hash yields:
$VAR1 = {
' TOTAL QUALITY' => 14,
'RESULT' => 16,
'CREATOR' => 3,
' ' => 20,
'TOTAL QUALITY' => 8,
'WELL' => 0,
'SAMPLE ID' => 5,
'DISPENSATION ORDER' => 2,
'POSITION NAME' => 15,
'QUALITY' => 17,
'DESCRIPTION' => 6,
'ENTRY ID' => 1,
'CREATION DATE' => 4,
'EDITED' => 18,
'RUN ID AND DATE' => 7
};
Could someone please tell me what happened to values 9 - 13 in the hash and when these two are not equivalent?
Thank you!!
Re: hash and array mismatch
by BrowserUk (Patriarch) on May 08, 2007 at 02:05 UTC
|
A hash cannot have duplicate keys, whereas an array can. Hence, the key POSITION NAME is given a value of 9. but then 6 lines later on, the value associated with that key gets overwritten with the value 15. And the key RESULT is initially set to the value 10, but the later overwritten with the value 16. And so on for all the other missing values.
What you can do about it depends upon what you want to do with the hash afterwards. It maybe as simple as reversing the keys and values. The line number (position) will always be unique, so you could use that as the key and the text as the value, but that makes it little more useful than the array.
Perhaps what you want is
foreach my $i ( 0..$#data ) {
push @{ $dataPos{ $data[$i] } }, $i;
print "$data[$i] -- > $i\n";
}
Which will accumulate an array of positions associated with each value. It would dump something like
$VAR1 = {
' TOTAL QUALITY' => [ 8, 14 ],
'RESULT' => [ 10, 16 ],
'CREATOR' => [ 3 ],
' ' => [ 13, 19, 20 ],
'TOTAL QUALITY' => [ 8 ],
'WELL' => [ 0 ],
'SAMPLE ID' => [ 5 ],
'DISPENSATION ORDER' => [ 2 ],
'POSITION NAME' => [ 9, 15 ],
'QUALITY' => [ 11, 17 ],
'DESCRIPTION' => [ 6 ],
'ENTRY ID' => [ 1 ],
'CREATION DATE' => [ 4 ],
'EDITED' => [ 12, 18 ],
'RUN ID AND DATE' => [ 7 ]
};
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
| [reply] [d/l] [select] |
|
| [reply] |
Re: hash and array mismatch
by GrandFather (Saint) on May 08, 2007 at 02:07 UTC
|
You are using the same key for multiple values. You will only retain the last value assigned to the hash for a given key. For example 'TOTAL QUALITY' has values 8 and 14, only 14 is retained.
Most likely a hash is not appropriate for whatever you are trying to achieve. You could however:
use warnings;
use strict;
use Data::Dump::Streamer;
my @data = <DATA>;
my %dataPos;
chomp @data;
push @{$dataPos{$data[$_]}}, $_ for 0 .. $#data;
Dump (\%dataPos);
__DATA__
WELL
TOTAL QUALITY
POSITION NAME
TOTAL QUALITY
POSITION NAME
Prints:
$HASH1 = {
"" => [ 3 ],
"POSITION NAME"
=> [
2,
5
],
"TOTAL QUALITY"
=> [
1,
4
],
WELL => [ 0 ]
};
Which creates a list of positions for each key value.
DWIM is Perl's answer to Gödel
| [reply] [d/l] [select] |
|
push @{$dataPos{$data[$_]}}, $_ for 0 .. $#data;
You are pushing data onto an undefined value that is dereferenced as an array! The script should (I thought) die the first time through the for loop. But it works. So I did some experimentats and read the docs.
I found out that you can do this:
my $foo; # $foo is undefined
push @$foo, 23;
print "I found @{$foo}\n";
Further testing showed that:
push @{ undef() }, 99;
Died as I had expected.
Further tests show that unshift works the same way. Also, pop and shift will create array references.
my $foo = undef;
shift @{$foo};
print "Foo: $foo\n"; # prints Foo: ARRAY(0x12121212)
The various docs don't mention this special behavior. Thanks for the education.
| [reply] [d/l] [select] |
|
It is called "autovivification". Have a read through perlreftut and perlref.
You have seen it happen in a light weight way whenever you assign a value to an array element that hadn't been used before. It is a little more interesting in the case of a reference which you are using for the first time, but is one of the underpinnings of creating and managing "interesting" data structures in Perl, as indeed was the case here.
My pleasure, education is a large part of what PerlMonks is about.
DWIM is Perl's answer to Gödel
| [reply] |
|
Re: hash and array mismatch
by TOD (Friar) on May 08, 2007 at 02:07 UTC
|
that's fairly easy to answer: the hash keys are identical. you might work around this bug with something like the following:
my %keys = keys %data;
my $hkey = $data{$i};
if ((my @found = grep /^$hkey$/, %keys) > 1) {
$hkey .= '_';
}
$dataPos{$hkey} = $i;
--------------------------------
masses are the opiate for religion.
| [reply] [d/l] |
|
foreach my $i ( 0..$#data ) {
my %keys = keys %dataPos; # Changed hash name to match hash OP is
+building
my $hkey = $data[$i]; # Changed data lookup to array to match O
+P code.
if ((my @found = grep /^$hkey$/, %keys) > 1) {
$hkey .= '_';
}
$dataPos{$hkey} = $i;
print "$data[$i] -- > $i\n";
}
After I tweaked your code to work with the OP's code, there are still problems. You have a unnecessary nested loop that will grow with each pass through the outer loop. Also, if a key shows up three or more times, the last value will overwrite the value stored for the second key.
I've fixed these issues below.
foreach my $i ( 0..$#data ) {
my $key = $data[i];
# append '_' until no matching key exists.
while ( exists $dataPos{$key} ) {
$key .= '_';
}
$dataPos{$key} = $i;
print "$key -- > $i\n";
}
Even after its been fixed, I still wouldn't use this approach. I'd store repeated values in arrays as other posters have suggested.
| [reply] [d/l] [select] |
Re: hash and array mismatch
by Moron (Curate) on May 08, 2007 at 09:56 UTC
|
Although its clear that the lack of chop or chomp is putting incorrect (duplicate) garbage values in your hash, the point is that if you are going to address the "", this creates a fair chance that the problem will go away at that point anyway -- see Text::CSV::Simple for an example solution where it definitely will! So I'd quickly move on to the quotes issue and don't spend time on this presented issue anyway -- it might never happen!
__________________________________________________________________________________
^M Free your mind!
| [reply] |
Re: hash and array mismatch
by jhourcle (Prior) on May 08, 2007 at 14:15 UTC
|
Lots of other people have already explained what the problem is ... however, in this particular case, it looks as if the data is denormalized, as there are two instances of the columns from 'TOTAL QUALITY' to 'EDITED'.
It looks to me as if the occurance of a blank header is a sign that the next grouping starts, but I can't be sure with only two groups.
From the looks of things, you're looking at records from a health department that tracks well water, and you have two tests per well. Unfortunately, without knowing what you're trying to do with this data, I can't really make any recomendation on how best to deal with it in a meaninful way.
| [reply] |
Re: hash and array mismatch
by Anonymous Monk on May 09, 2007 at 19:09 UTC
|
The keys corresponding to 9-13 were repeated later in your $line, which resulted in an overwrite of the values corresponding to those keys.
For example, from the first set of print statements, position 9 corresponds to "POSITION NAME", which would look something like:
$dataPos{"POSITION NAME"} = 9;
A little further in your list you'll see "POSITION NAME" at position fifteen; the assignment statement for this would look something like
$dataPos{"POSITION NAME"} = 15;
Since this statement is executed after the first, your 9 disappears, and only 15 is stored as the value for the key of "POSITION NAME".
Repeat the above for entries 10-13, and you should see those values got overwritten.
If you want to keep all this data, you might consider making the integers your keys, and saving the text as the values. Of course, you already did that when you created the @data array.
hth.
-hershmaster | [reply] [d/l] [select] |
|
|