Just for my own amusement a oneliner solution:
# warning windows doublequotes
perl -lnE "$ar[0]++; $ar[pos]{$1}++ while /(.)/g;
END{ foreach $row (1..$#ar){print join qq(\t),$row,map{$_,sprintf
+('%.2f',$ar[$row]{$_}/$ar[0] )} sort keys %{$ar[$row]}} }" freq.txt
1 A 0.75 B 0.25
2 A 1.00
3 A 0.25 B 0.50 C 0.25
4 B 1.00
5 B 0.25 C 0.50 D 0.25
The datastructure created is an array where the first element, $ar[0] is a scalar used to hold how many lines we processed. This is because you does not need to track char at position 0, pos starting from 1. Other elements are anonymous hashes where keys are your chars and values are occurrences found (at the position given by the current index of the @ar array we are processing).
See the datastructure with the help of Data::Dump:
perl -MData::Dump -lnE "$ar[0]++; $ar[pos]{$1}++ while /(.)/g;
END{ dd @ar }" freq.txt
(
4, # el 0 is the lines count
{ A => 3, B => 1 }, # el 1 contains occurences found at p
+osition 1
{ A => 4 }, # el 2 .. so on
{ A => 1, B => 2, C => 1 },
{ B => 4 },
{ B => 1, C => 2, D => 1 },
)
Deparsing the first oneliner you can see the whole picture, commented:
perl -MO=Deparse -lnE "$ar[0]++; $ar[pos]{$1}++ while /(.)/g;
END{ foreach $row (1..$#ar){print join qq(\t),$row,map{$_,sprintf
+('%.2f',$ar[$row]{$_}/$ar[0] )} sort keys %{$ar[$row]}}}" freq.txt
BEGIN { $/ = "\n"; $\ = "\n"; } # implicit initialization
BEGIN {
$^H{'feature_unicode'} = q(1);
$^H{'feature_say'} = q(1);
$^H{'feature_state'} = q(1);
$^H{'feature_switch'} = q(1);
}
# our program:
LINE: while (defined($_ = <ARGV>)) { # reading all files because of pe
+rl -n
chomp $_; # automatic handling of end of li
+ne given by perl -l
++$ar[0]; # el 0 keeps track of line proces
+sed
++$ar[pos $_]{$1} while /(.)/g; # /(.)/g return all char setting
+$1 to
# the char and making pos returni
+ng it's position
# so with ++ we augment occurence
+s of char given by $1
# found at position given by pos
sub END {
foreach $row (1 .. $#ar) { # now we process rows of the arra
+y starting
# from 1, because position coinci
+de with array index
print join("\t", # joining all following with a ta
+b
$row, # the row is equal to the positio
+n in the string
map({ # then foreach key of the hash (t
+he el. $row of @a)
$_, # the sorted key
sprintf('%.2f', $ar[$row]{$_} / $a
+r[0]); # it's value divided
+ # by linecount, formatted
} sort(keys %{$ar[$row];})));
}
}
;
}
-e syntax OK
L*
There are no rules, there are no thumbs..
Reinvent the wheel, then learn The Wheel; may be one day you reinvent one of THE WHEELS.
|