Beefy Boxes and Bandwidth Generously Provided by pair Networks
good chemistry is complicated,
and a little bit messy -LW
 
PerlMonks  

Problems with complex structure and array of arrays

by remluvr (Sexton)
on Mar 01, 2012 at 22:09 UTC ( [id://957333]=perlquestion: print w/replies, xml ) Need Help??

remluvr has asked for the wisdom of the Perl Monks concerning the following question:

Hello Monks! It looks like today is my day in having problems I cannot resolve. It looks like two years without coding + problems in understanding complex structure might generate a lot of problems!
So, here's my problem. I have this kind of input:

frog-n as novelty-n 5.8504 frog-n be yellow-n 6.1961 frog-n be-1 Asia-n 5.0937 frog-n coord zebra-n 5.9279 frog-n coord-1 Canuck-n 6.3363 frog-n nmod-1 mule-n 4.2881 amphibian-n success-1 surprising-j 14.6340 amphibian-n such_as alligator-n 11.5265 amphibian-n than work-n 5.9948 amphibian-n though stalk-n 13.2228

and my output should be a "matrix", as to say, made like the following:

frog-n as_novelty-n,5.8504 be_yellow-n,6.1961 be-1_Asia-n,5.0937 + coord_zebra-n,5.9279 coord-1_Canuck-n,6.3363 nmod-1_mule-n, +4.2881 amphibian-n success-1_surprising-j,14.6340 such_as_alligator-n,1 +1.5265 than_work-n,5.9948 though_stalk-n,13.2228

basically, the first element contained in the first column of the input file is the key and a joint expression between the element contained in the 2nd and 3rd column, with the corresponding score

I managed to do the following:

my $prefix = shift; my $input = shift; my $file = $prefix . ".txt"; if (-e $file) { print STDERR "$file already exists, deleting previous version\n"; `rm -f $file`; } my $debug=0; #Variabile di debug. Vale 1 in fase di debug, si usa per my %seen = (); my @global_els = (); my @row_els = (); my %score_of = (); my $row_el; my $gram; my $col_el; my $score_of; my $score; my $global_el; open INPUT,$input; while(<INPUT>){ chomp; ($row_el,$gram,$col_el,$score) = split "[\t ]+",$_; $global_el=$gram."_".$col_el; if (!($seen{"glob"}{$global_el}++)) { push @global_els,$global_el; } if (!$seen{"row"}{$row_el}++) { push @row_els,$row_el; } $score_of{$row_el}{$global_el} = $score; if($debug){ print "Check:".$row_el."=>".$global_el."=>".$score; } } close INPUT; #@global_els = (); #@row_els = (); open MATRIX,">$file"; #my $score_b=$score_of{$row_el}{$global_el}; foreach $row_el (@row_els) { print MATRIX "\t",$row_el; foreach $global_el (@global_els) { print MATRIX "\t",$global_el; print MATRIX ",",$score_of{$row_el}{$global_el}; } print MATRIX "\n"; } close MATRIX;

But my output is wrong, since all the so-called joined elements appear in both the lines, even if they are not related to the element in that line. For example, the output I get using the data above is like:

frog-n as_novelty-n,5.8504 be_yellow-n,6.1961 be-1_Asia-n,5.0937 + coord_zebra-n,5.9279 coord-1_Canuck-n,6.3363 nmod-1_mule-n, +4.2881 success-1_surprising-j, such_as_alligator-n, than_wor +k-n, though_stalk-n, amphibian-n success-1_surprising-j,14.6340 such_as_alligator-n,1 +1.5265 than_work-n,5.9948 though_stalk-n,13.2228 as_novelty +-n, be_yellow-n, be-1_Asia-n, coord_zebra-n, coord-1_Canu +ck-n, nmod-1_mule-n,

What did I get wrong? How can I improve it?
Thanks everyone,
Giulia

Replies are listed 'Best First'.
Re: Problems with complex structure and array of arrays
by Eliya (Vicar) on Mar 01, 2012 at 22:24 UTC

    If I'm understanding you correctly, you could just collect things in a hash, keyed by the first column:

    my %rows; while (<DATA>) { my ($key, $c1, $c2, $c3) = split; push @{ $rows{$key} }, "${c1}_$c2,$c3"; } for my $key (keys %rows) { print join("\t", $key, @{ $rows{$key} }), "\n"; } __DATA__ frog-n as novelty-n 5.8504 frog-n be yellow-n 6.1961 frog-n be-1 Asia-n 5.0937 frog-n coord zebra-n 5.9279 frog-n coord-1 Canuck-n 6.3363 frog-n nmod-1 mule-n 4.2881 amphibian-n success-1 surprising-j 14.6340 amphibian-n such_as alligator-n 11.5265 amphibian-n than work-n 5.9948 amphibian-n though stalk-n 13.2228

    Output:

    frog-n as_novelty-n,5.8504 be_yellow-n,6.1961 be-1_Asia-n,5. +0937 coord_zebra-n,5.9279 coord-1_Canuck-n,6.3363 nmod-1_mule +-n,4.2881 amphibian-n success-1_surprising-j,14.6340 such_as_alligator-n,11 +.5265 than_work-n,5.9948 though_stalk-n,13.2228

    (Note that the rows in the unsorted hash printout are only "by accident" in the same order the keys occurred the input data.  In case ordering matters, you'd need to take care of it separately (e.g. using Tie::IxHash).)

Re: Problems with complex structure and array of arrays
by planetscape (Chancellor) on Mar 02, 2012 at 07:22 UTC

      Thanks, I'll study the link you suggested. If you have any other suggestion regarding complex data structures, I'll be happy to hear them, since it seems complex data structures are my biggest problems.
      Thanks again.
      Giulia

Re: Problems with complex structure and array of arrays
by tangent (Parson) on Mar 01, 2012 at 22:45 UTC
    The array @global_els contains all elements, not just the elements associated with each $row_el. Although there are other ways to better achieve what you want, you could fix your code by adding one line:
    foreach $row_el (@row_els) { print "\t",$row_el; foreach $global_el (@global_els) { # add this line... next unless exists $score_of{$row_el}{$global_el}; print "\t",$global_el; print ",",$score_of{$row_el}{$global_el}; } print "\n"; }
    BTW if you used warnings you would have got a host of "unitialised value" errors. Data::Dumper is a great help too in these situations.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://957333]
Approved by Eliya
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others having an uproarious good time at the Monastery: (5)
As of 2024-03-29 01:15 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found