Beefy Boxes and Bandwidth Generously Provided by pair Networks
There's more than one way to do things
 
PerlMonks  

multi dimensional hash

by Anonymous Monk
on Mar 03, 2018 at 19:36 UTC ( [id://1210290]=perlquestion: print w/replies, xml ) Need Help??

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

For sure an easy task. But I need some help. I have a file with three columns. I need to organize the information to obtain a data structure that mirrors the following: for each instance of $tag3 my data structure needs to have its frequency (count) and all the unique $tag1 that may come along with $tag3 (same row). ($tag2 is just a control). I have written the following script which creates a multi dimensional hash. The counting is done correctly, but what would be the best way to save all unique values of $tag1? In my script only the last $tag1 seen is kept.

Furthermore I have an ugly "Use of uninitialized value in addition (+) at .\myscript.pl line 14" the first time a $tag3 is inserted in the hash. How can elegantly I prevent it?

Example of result: for my $tag3='conference' the counter should be 3 (correctly done by the script), but I should also register the 2 unique $tag1 which are "conference" and "conferences" (which I don't know the best way to do it).

#!/usr/bin/perl use strict; use warnings; use Data::Dumper qw(Dumper); my $line = <DATA>; my %hash; while($line){ my ($tag1, $tag2, $tag3) = split(/\t/, $line); if ($tag2 =~/NN/) { $hash{$tag3}{frequency} = (($hash{$tag3}{frequency +})+1); $hash{$tag3}{variants} = $tag1; } $line = <DATA>; } print Dumper \%hash; __DATA__ The DT the International NN International for IN for well NN well preparation NN preparation preparation NN preparation in IN in conference NN conference conference NN conference conferences NN conference good VVG good

Replies are listed 'Best First'.
Re: multi dimensional hash
by Cristoforo (Curate) on Mar 03, 2018 at 20:31 UTC
    A hash of hash might be a possible solution. Here I didn't keep a separate frequency but generated a frequency for each $tag1 as the value to the hash.
    #!/usr/bin/perl use strict; use warnings; use Data::Dumper qw(Dumper); use List::Util 'sum'; open my $fh, '<', \<<EOF; The DT the International NN International for IN for well NN well preparation NN preparation preparation NN preparation in IN in conference NN conference conference NN conference conferences NN conference good VVG good EOF my %hash; while (<$fh>) { chomp; my ($tag1, $tag2, $tag3) = split /\t/; if ($tag2 =~/NN/) { $hash{$tag3}{$tag1}++; } } print Dumper \%hash; for my $tag (keys %hash) { printf "%s freq: %d\n", $tag, sum values %{ $hash{$tag} }; }
    Output:
    $VAR1 = { 'well' => { 'well' => 1 }, 'International' => { 'International' => 1 }, 'conference' => { 'conference' => 2, 'conferences' => 1 }, 'preparation' => { 'preparation' => 2 } }; well freq: 1 International freq: 1 conference freq: 3 preparation freq: 2
Re: multi dimensional hash
by poj (Abbot) on Mar 03, 2018 at 20:24 UTC
    best way to save all unique values of $tag1

    You could add another level below variants

    #!/usr/bin/perl use strict; use warnings; use Data::Dumper; my %hash; while (my $line = <DATA>){ my ($tag1, $tag2, $tag3) = split /\t/, $line; if ($tag2 =~/NN/) { ++$hash{$tag3}{frequency}; ++$hash{$tag3}{variants}{$tag1}; } } print Dumper \%hash;
    poj
Re: multi dimensional hash
by johngg (Canon) on Mar 04, 2018 at 00:05 UTC

    Make the "variants" an array and push items onto it. Then sift out unique values once all the data has been read.

    use strict; use warnings; use Data::Dumper; open my $inFH, q{<}, \ <<EOD or die $!; The DT the International NN International for IN for well NN well preparation NN preparation preparation NN preparation in IN in conference NN conference conference NN conference conferences NN conference good VVG good EOD do { my $discard = <$inFH> }; my %hash; while ( <$inFH> ) { my @tags = split; next unless $tags[ 1 ] eq q{NN}; $hash{ $tags[ 2 ] }->{ frequency } ++; push @{ $hash{ $tags[ 2 ] }->{ variants } }, $tags[ 0 ]; } @{ $hash{ $_ }->{ variants } } = do { my %seen; grep { not $seen{ $_ } ++ } @{ $hash{ $_ }->{ variants } }; } for keys %hash; print Data::Dumper->Dumpxs( [ \ %hash ], [ qw{ *hash } ] );

    The output.

    %hash = ( 'preparation' => { 'frequency' => 2, 'variants' => [ 'preparation' ] }, 'conference' => { 'frequency' => 3, 'variants' => [ 'conference', 'conferences' ] }, 'well' => { 'frequency' => 1, 'variants' => [ 'well' ] }, 'International' => { 'variants' => [ 'International' ], 'frequency' => 1 } );

    I hope this is helpful.

    Update: Perhaps simpler would be to keep a ->{ seen }->{ $tags[ 0 ] } sub-sub-HoH to filter out duplicates and delete it at the end.

    ... while ( <$inFH> ) { my @tags = split; next unless $tags[ 1 ] eq q{NN}; $hash{ $tags[ 2 ] }->{ frequency } ++; push @{ $hash{ $tags[ 2 ] }->{ variants } }, $tags[ 0 ] unless $hash{ $tags[ 2 ] }->{ seen }->{ $tags[ 0 ] } ++; } delete $hash{ $_ }->{ seen } for keys %hash; ...

    Cheers,

    JohnGG

      do { my $discard = <$inFH> };

      I don't think Anonymous Monk really wants to discard the first line/record of data (update: that was my first thought, too); it's just an illusion created by the peculiar way he or she declares and pre-initializes the  $line variable prior to entering the  while($line){ ... } loop in the OPed code. Note also the odd way the next  $line of data is read at the end of the while-loop in that code.

      Update: I must also express my preference for the use of List::Util::uniq() (which used to be in List::MoreUtils — and still is!) rather than the explicit grep-ing to a hash that you're doing: it seems to express intent much more clearly for little or no cost.


      Give a man a fish:  <%-{-{-{-<

        I agree that List::Util::uniq would be nicer but it only made the transition to the core module in 5.26 which I don't have installed on this box (and I haven't installed List::MoreUtils yet after having to rebuild after Meltdown & Spectre patches killed it). You are right about the first line not being discarded, missed that entirely :-/

        Cheers,

        JohnGG

Re: multi dimensional hash
by Marshall (Canon) on Mar 06, 2018 at 02:21 UTC
    If tag1 and tag3 are equal, this is overly complicated.
    This line:
    conferences NN conference
    might be wrong?
    #!/usr/bin/perl use strict; use warnings; use Data::Dumper qw(Dumper); my %hash; while (my $line = <DATA>) { next if $line =~ /^\s*$/; # skip blank lines my ($tag1, $tag2, $tag3) = split(/\s+/, $line); next unless $tag2 eq 'NN'; $hash{$tag3}++; } print Dumper \%hash; =prints $VAR1 = { 'well' => 1, 'conference' => 3, 'International' => 1, 'preparation' => 2 }; =cut __DATA__ The DT the International NN International for IN for well NN well preparation NN preparation preparation NN preparation in IN in conference NN conference conference NN conference conferences NN conference good VVG good
Re: multi dimensional hash
by Anonymous Monk on Mar 05, 2018 at 14:07 UTC
    Another trick is to use a hash-key like "$hash1|hash2" with a single hash, so that the key is something like "The|the". Auto-vivification takes care of adding new keys to the hash (with initial value zero). When the time comes, you simply split() the hash-key strings.
Re: multi dimensional hash
by Anonymous Monk on Mar 04, 2018 at 07:47 UTC

    Thank you very much for all your suggestions, guys. I've learnt a lot. Not only to reach what I wanted, but also some nice other nice things, like the way I used $line = <DATA>;. At the end I opted for an array to store the "variants", even if all other suggestions were great. Eger to learn more on handling multi dimensional hashes/arrays.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1210290]
Front-paged by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others sharing their wisdom with the Monastery: (5)
As of 2024-04-25 14:06 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found