Hello Monks!
It looks like today is my day in having problems I cannot resolve. It looks like two years without coding + problems in understanding complex structure might generate a lot of problems!
So, here's my problem. I have this kind of input:
frog-n as novelty-n 5.8504
frog-n be yellow-n 6.1961
frog-n be-1 Asia-n 5.0937
frog-n coord zebra-n 5.9279
frog-n coord-1 Canuck-n 6.3363
frog-n nmod-1 mule-n 4.2881
amphibian-n success-1 surprising-j 14.6340
amphibian-n such_as alligator-n 11.5265
amphibian-n than work-n 5.9948
amphibian-n though stalk-n 13.2228
and my output should be a "matrix", as to say, made like the following:
frog-n as_novelty-n,5.8504 be_yellow-n,6.1961 be-1_Asia-n,5.0937
+ coord_zebra-n,5.9279 coord-1_Canuck-n,6.3363 nmod-1_mule-n,
+4.2881
amphibian-n success-1_surprising-j,14.6340 such_as_alligator-n,1
+1.5265 than_work-n,5.9948 though_stalk-n,13.2228
basically, the first element contained in the first column of the input file is the key and a joint expression between the element contained in the 2nd and 3rd column, with the corresponding score
I managed to do the following:
my $prefix = shift;
my $input = shift;
my $file = $prefix . ".txt";
if (-e $file) {
print STDERR "$file already exists, deleting previous version\n";
`rm -f $file`;
}
my $debug=0; #Variabile di debug. Vale 1 in fase di debug, si usa per
my %seen = ();
my @global_els = ();
my @row_els = ();
my %score_of = ();
my $row_el;
my $gram;
my $col_el;
my $score_of;
my $score;
my $global_el;
open INPUT,$input;
while(<INPUT>){
chomp;
($row_el,$gram,$col_el,$score) = split "[\t ]+",$_;
$global_el=$gram."_".$col_el;
if (!($seen{"glob"}{$global_el}++)) {
push @global_els,$global_el;
}
if (!$seen{"row"}{$row_el}++) {
push @row_els,$row_el;
}
$score_of{$row_el}{$global_el} = $score;
if($debug){
print "Check:".$row_el."=>".$global_el."=>".$score;
}
}
close INPUT;
#@global_els = ();
#@row_els = ();
open MATRIX,">$file";
#my $score_b=$score_of{$row_el}{$global_el};
foreach $row_el (@row_els) {
print MATRIX "\t",$row_el;
foreach $global_el (@global_els) {
print MATRIX "\t",$global_el;
print MATRIX ",",$score_of{$row_el}{$global_el};
}
print MATRIX "\n";
}
close MATRIX;
But my output is wrong, since all the so-called joined elements appear in both the lines, even if they are not related to the element in that line. For example, the output I get using the data above is like:
frog-n as_novelty-n,5.8504 be_yellow-n,6.1961 be-1_Asia-n,5.0937
+ coord_zebra-n,5.9279 coord-1_Canuck-n,6.3363 nmod-1_mule-n,
+4.2881 success-1_surprising-j, such_as_alligator-n, than_wor
+k-n, though_stalk-n,
amphibian-n success-1_surprising-j,14.6340 such_as_alligator-n,1
+1.5265 than_work-n,5.9948 though_stalk-n,13.2228 as_novelty
+-n, be_yellow-n, be-1_Asia-n, coord_zebra-n, coord-1_Canu
+ck-n, nmod-1_mule-n,
What did I get wrong? How can I improve it?
Thanks everyone,
Giulia