I'm wondering if there's a simple way I could shoehorn this feature in without a full rewrite. Basically I want to just throw in a quick calculation of changes between columns.
Here's a trimmed down example of what I'm doing:
Read a file of num to name mappings:
num|name
1|foo
12|bar
15|bar
18|baz
25|quux
27|quux
37|quuux
48|quuuux
Read any number of files (granted, it's rarely more than a handful) that look like this:
file1:
acct|description|nums
foo-001|foo one|1,12,27,37
foo-002|foo two|1,15,25,37
foo-003|foo three|1,18,25,37
foo-004|foo four|12,18,25,37
foo-005|foo five|12,15,25,27,37
foo-006|foo six|1,12,25,27,37,99
file2:
acct|description|nums
foo-001|foo one|1,12,15,27,37
foo-002|foo two|1,15,25,37
foo-003|foo three|1,18,25
foo-004|foo four|12,18,25,37
foo-005|foo five|12,15,25,27,37
foo-006|foo six|1,12,15,25,27,37,99
file3:
acct|description|nums
foo-001|foo one|1,12,15,18,27,37
foo-002|foo two|1,15,25,37
foo-003|foo three|1,18,25
foo-004|foo four|12,18,25,37
foo-005|foo five|12,15,25,27,37
foo-006|foo six|1,12,15,25,27,37,99
I iterate and increment values in a hashref as I go so it ends up looking like:
$hash => {
file1 => {
foo => 4,
bar => 6,
baz => 4,
quux => 8,
quuux => 6,
quuuux => 0,
},
file2 => {
foo => 4,
bar => 8,
baz => 4,
quux => 8,
quuux => 5,
quuuux => 0,
},
file3 => {
foo => 4,
bar => 8,
baz => 5,
quux => 8,
quuux => 5,
quuuux => 0,
},
}
And then simply I produce a report like this to make it easy to spot the differences:
name file1 file2 file3
foo 4 4 4
bar 6 8 8
baz 4 4 5
quux 8 8 8
quuux 6 5 5
quuuux empty empty empty
What I'd like to do is something like this:
name file1 file2 %change file3 %change
foo 4 4 0% 4 0%
bar 6 8 133% 8 0%
baz 4 4 0% 5 125%
quux 8 8 0% 8 0%
quuux 6 5 -83% 5 0%
quuuux empty empty empty
My code is very straightforward, the only difference being I'm handling a file with 20+ columns and the each file is 300,000+ lines.
I'm populating %$hash exactly as you might expect, opening each file, iterating, if $hash->{$file}->{$name} isn't defined I define it, otherwise I increment $hash->{$file}->{$name}++. (The other subtle differences are intentional, like how 99 appears on some of the rows, but since it doesn't appear in the mapping of nums to names, I don't include it in the report.)
It doesn't seem like trying to calculate differences inside the same loop I'm using to iterate the files is the way to go.
I only see a couple of possible paths, but I'm can't wrap my head around either of them very well. Should I iterate the resulting %$hash after I finish creating it and make a new hash out of the results? Or, while I populate %$hash, should I also somehow populate an additional hash to make it easy to calculate later?
Any advice appreciated.
Just so to include the obligatory code sample it goes something like this:
my $hash = {};
my @heading = ( 'name' );
my @report;
for my $fn ( sort @filenames ) {
open my $fh, '<', $fn
or die "Error opening file ${fn}: $!\n";
# Read in each filename and populate the hash
#
while (<$fh>) {
chomp;
s%\r%%;
my @line = split /\|/;
my @curnums = split( ',', $line[2] );
for my $curnum ( @curnums ) {
next unless $defined $nums_to_names->{$curnum};
if ( ! defined $hash->{$fn} or ! defined $hash->{$fn}->{$c
+urnum} ) {
$hash->{$fn}->{$curnum} = 1;
} else {
$hash->{$fn}->{$curnum}++;
}
}
}
# Iterate the mapping of numbers to names
#
for my $curnum ( sort keys %$nums_to_names ) {
my @report_line;
# Skip it unless it's defined and has a value
#
my $name =
defined $nums_to_names->{$curnum}
&& $nums_to_names->{$curnum}
? $nums_to_names->{$curnum}
: next
;
push @report_line, $name;
# For the current mapping number, pluck the corresponding coun
+ts
# related to each file
#
for my $curfilename ( sort keys %$hash ) {
my $count =
defined $hash->{$curfilename}->{$curnum}
&& $hash->{$curfilename}->{$curnum}
? commify($hash->{$curfilename}->{$curnum})
: 'empty'
;
push @report_line, $count;
}
push @report, \@report_line;
}
}
push @heading, basename($_) for sort @filenames;
Then I iterate @heading and @report and print them out cell by cell.
Any tips on how I might add the percentage change between columns?
-
Are you posting in the right place? Check out Where do I post X? to know for sure.
-
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big>
<blockquote> <br /> <dd>
<dl> <dt> <em> <font>
<h1> <h2> <h3> <h4>
<h5> <h6> <hr /> <i>
<li> <nbsp> <ol> <p>
<small> <strike> <strong>
<sub> <sup> <table>
<td> <th> <tr> <tt>
<u> <ul>
-
Snippets of code should be wrapped in
<code> tags not
<pre> tags. In fact, <pre>
tags should generally be avoided. If they must
be used, extreme care should be
taken to ensure that their contents do not
have long lines (<70 chars), in order to prevent
horizontal scrolling (and possible janitor
intervention).
-
Want more info? How to link
or How to display code and escape characters
are good places to start.