multi dimensional hash

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

For sure an easy task. But I need some help. I have a file with three columns. I need to organize the information to obtain a data structure that mirrors the following: for each instance of $tag3 my data structure needs to have its frequency (count) and all the unique $tag1 that may come along with $tag3 (same row). ($tag2 is just a control). I have written the following script which creates a multi dimensional hash. The counting is done correctly, but what would be the best way to save all unique values of $tag1? In my script only the last $tag1 seen is kept.

Furthermore I have an ugly "Use of uninitialized value in addition (+) at .\myscript.pl line 14" the first time a $tag3 is inserted in the hash. How can elegantly I prevent it?

Example of result: for my $tag3='conference' the counter should be 3 (correctly done by the script), but I should also register the 2 unique $tag1 which are "conference" and "conferences" (which I don't know the best way to do it).

    #!/usr/bin/perl
    use strict;
    use warnings;
     
    use Data::Dumper qw(Dumper);

    my $line = <DATA>;
    my %hash;
        while($line){
            my ($tag1, $tag2, $tag3) = split(/\t/, $line);
                if ($tag2 =~/NN/) {
                    $hash{$tag3}{frequency} = (($hash{$tag3}{frequency
+})+1);
                    $hash{$tag3}{variants} = $tag1;
                } 
        $line = <DATA>;
        }
    
     
    print Dumper \%hash;

__DATA__
The    DT    the
International    NN    International
for    IN    for
well    NN    well
preparation    NN    preparation
preparation    NN    preparation
in    IN    in
conference    NN    conference
conference    NN    conference
conferences    NN    conference
good    VVG    good
[download]

Comment on multi dimensional hash Download Code

Replies are listed 'Best First'.
Re: multi dimensional hash by Cristoforo (Curate) on Mar 03, 2018 at 20:31 UTC
A hash of hash might be a possible solution. Here I didn't keep a separate frequency but generated a frequency for each $tag1 as the value to the hash. #!/usr/bin/perl use strict; use warnings; use Data::Dumper qw(Dumper); use List::Util 'sum'; open my $fh, '<', \<<EOF; The DT the International NN International for IN for well NN well preparation NN preparation preparation NN preparation in IN in conference NN conference conference NN conference conferences NN conference good VVG good EOF my %hash; while (<$fh>) { chomp; my ($tag1, $tag2, $tag3) = split /\t/; if ($tag2 =~/NN/) { $hash{$tag3}{$tag1}++; } } print Dumper \%hash; for my $tag (keys %hash) { printf "%s freq: %d\n", $tag, sum values %{ $hash{$tag} }; } [download] Output: `$VAR1 = { 'well' => { 'well' => 1 }, 'International' => { 'International' => 1 }, 'conference' => { 'conference' => 2, 'conferences' => 1 }, 'preparation' => { 'preparation' => 2 } }; well freq: 1 International freq: 1 conference freq: 3 preparation freq: 2` [download]	[reply] [d/l] [select]
Re: multi dimensional hash by poj (Abbot) on Mar 03, 2018 at 20:24 UTC
best way to save all unique values of $tag1 You could add another level below variants `#!/usr/bin/perl use strict; use warnings; use Data::Dumper; my %hash; while (my $line = <DATA>){ my ($tag1, $tag2, $tag3) = split /\t/, $line; if ($tag2 =~/NN/) { ++$hash{$tag3}{frequency}; ++$hash{$tag3}{variants}{$tag1}; } } print Dumper \%hash;` [download] poj	[reply] [d/l]
Re: multi dimensional hash by johngg (Canon) on Mar 04, 2018 at 00:05 UTC
Make the "variants" an array and push items onto it. Then sift out unique values once all the data has been read. use strict; use warnings; use Data::Dumper; open my $inFH, q{<}, \ <<EOD or die $!; The DT the International NN International for IN for well NN well preparation NN preparation preparation NN preparation in IN in conference NN conference conference NN conference conferences NN conference good VVG good EOD do { my $discard = <$inFH> }; my %hash; while ( <$inFH> ) { my @tags = split; next unless $tags[ 1 ] eq q{NN}; $hash{ $tags[ 2 ] }->{ frequency } ++; push @{ $hash{ $tags[ 2 ] }->{ variants } }, $tags[ 0 ]; } @{ $hash{ $_ }->{ variants } } = do { my %seen; grep { not $seen{ $_ } ++ } @{ $hash{ $_ }->{ variants } }; } for keys %hash; print Data::Dumper->Dumpxs( [ \ %hash ], [ qw{ hash } ] ); [download] The output. `%hash = ( 'preparation' => { 'frequency' => 2, 'variants' => [ 'preparation' ] }, 'conference' => { 'frequency' => 3, 'variants' => [ 'conference', 'conferences' ] }, 'well' => { 'frequency' => 1, 'variants' => [ 'well' ] }, 'International' => { 'variants' => [ 'International' ], 'frequency' => 1 } );` [download] I hope this is helpful. Update:* Perhaps simpler would be to keep a `->{ seen }->{ $tags[ 0 ] }` sub-sub-HoH to filter out duplicates and delete it at the end. `... while ( <$inFH> ) { my @tags = split; next unless $tags[ 1 ] eq q{NN}; $hash{ $tags[ 2 ] }->{ frequency } ++; push @{ $hash{ $tags[ 2 ] }->{ variants } }, $tags[ 0 ] unless $hash{ $tags[ 2 ] }->{ seen }->{ $tags[ 0 ] } ++; } delete $hash{ $_ }->{ seen } for keys %hash; ...` [download] Cheers, JohnGG	[reply] [d/l] [select]
Re^2: multi dimensional hash by AnomalousMonk (Archbishop) on Mar 04, 2018 at 01:33 UTC
`do { my $discard = <$inFH> };` I don't think Anonymous Monk really wants to discard the first line/record of data (update: that was my first thought, too); it's just an illusion created by the peculiar way he or she declares and pre-initializes the `$line` variable prior to entering the `while($line){ ... }` loop in the OPed code. Note also the odd way the next `$line` of data is read at the end of the `while`-loop in that code. Update: I must also express my preference for the use of List::Util`::uniq()` (which used to be in List::MoreUtils — and still is!) rather than the explicit grep-ing to a hash that you're doing: it seems to express intent much more clearly for little or no cost. Give a man a fish: `<%-{-{-{-<`	[reply] [d/l] [select]
Re^3: multi dimensional hash by johngg (Canon) on Mar 04, 2018 at 10:00 UTC
I agree that List::Util`::uniq` would be nicer but it only made the transition to the core module in 5.26 which I don't have installed on this box (and I haven't installed List::MoreUtils yet after having to rebuild after Meltdown & Spectre patches killed it). You are right about the first line not being discarded, missed that entirely :-/ Cheers, JohnGG	[reply] [d/l]
Re: multi dimensional hash by Marshall (Canon) on Mar 06, 2018 at 02:21 UTC
If tag1 and tag3 are equal, this is overly complicated. This line: `conferences NN conference` [download] might be wrong? #!/usr/bin/perl use strict; use warnings; use Data::Dumper qw(Dumper); my %hash; while (my $line = <DATA>) { next if $line =~ /^\s*$/; # skip blank lines my ($tag1, $tag2, $tag3) = split(/\s+/, $line); next unless $tag2 eq 'NN'; $hash{$tag3}++; } print Dumper \%hash; =prints $VAR1 = { 'well' => 1, 'conference' => 3, 'International' => 1, 'preparation' => 2 }; =cut __DATA__ The DT the International NN International for IN for well NN well preparation NN preparation preparation NN preparation in IN in conference NN conference conference NN conference conferences NN conference good VVG good [download]	[reply] [d/l] [select]
Re: multi dimensional hash by Anonymous Monk on Mar 05, 2018 at 14:07 UTC
Another trick is to use a hash-key like `"$hash1\|hash2"` with a single hash, so that the key is something like `"The\|the"`. Auto-vivification takes care of adding new keys to the hash (with initial value zero). When the time comes, you simply `split()` the hash-key strings.	[reply]
Re: multi dimensional hash by Anonymous Monk on Mar 04, 2018 at 07:47 UTC
Thank you very much for all your suggestions, guys. I've learnt a lot. Not only to reach what I wanted, but also some nice other nice things, like the way I used $line = <DATA>;. At the end I opted for an array to store the "variants", even if all other suggestions were great. Eger to learn more on handling multi dimensional hashes/arrays.	[reply]
Re^2: multi dimensional hash by AnomalousMonk (Archbishop) on Mar 04, 2018 at 17:03 UTC
Eger to learn more on handling multi dimensional hashes/arrays. See also Perl Data Structures Cookbook and perllol. Give a man a fish: `<%-{-{-{-<`	[reply] [d/l]


There's more than one way to do things
	PerlMonks