comment on

I'm wondering if there's a simple way I could shoehorn this feature in without a full rewrite. Basically I want to just throw in a quick calculation of changes between columns.

Here's a trimmed down example of what I'm doing:

Read a file of num to name mappings:

num|name
1|foo
12|bar
15|bar
18|baz
25|quux
27|quux
37|quuux
48|quuuux
[download]

Read any number of files (granted, it's rarely more than a handful) that look like this:

file1:
acct|description|nums
foo-001|foo one|1,12,27,37
foo-002|foo two|1,15,25,37
foo-003|foo three|1,18,25,37
foo-004|foo four|12,18,25,37
foo-005|foo five|12,15,25,27,37
foo-006|foo six|1,12,25,27,37,99

file2:
acct|description|nums
foo-001|foo one|1,12,15,27,37
foo-002|foo two|1,15,25,37
foo-003|foo three|1,18,25
foo-004|foo four|12,18,25,37
foo-005|foo five|12,15,25,27,37
foo-006|foo six|1,12,15,25,27,37,99

file3:
acct|description|nums
foo-001|foo one|1,12,15,18,27,37
foo-002|foo two|1,15,25,37
foo-003|foo three|1,18,25
foo-004|foo four|12,18,25,37
foo-005|foo five|12,15,25,27,37
foo-006|foo six|1,12,15,25,27,37,99
[download]

I iterate and increment values in a hashref as I go so it ends up looking like:

$hash => {
    file1 => {
                foo    => 4,
                bar    => 6,
                baz    => 4,
                quux   => 8,
                quuux  => 6,
                quuuux => 0,
             },

    file2 => {
                foo    => 4,
                bar    => 8,
                baz    => 4,
                quux   => 8,
                quuux  => 5,
                quuuux => 0,
             },

    file3 => {
                foo    => 4,
                bar    => 8,
                baz    => 5,
                quux   => 8,
                quuux  => 5,
                quuuux => 0,
             },
}
[download]

And then simply I produce a report like this to make it easy to spot the differences:

name    file1   file2   file3
foo     4       4       4
bar     6       8       8
baz     4       4       5
quux    8       8       8
quuux   6       5       5
quuuux  empty   empty   empty
[download]

What I'd like to do is something like this:

name    file1   file2   %change  file3   %change
foo     4       4       0%       4       0%
bar     6       8       133%     8       0%
baz     4       4       0%       5       125%
quux    8       8       0%       8       0%
quuux   6       5       -83%     5       0%
quuuux  empty   empty            empty
[download]

My code is very straightforward, the only difference being I'm handling a file with 20+ columns and the each file is 300,000+ lines.

I'm populating %$hash exactly as you might expect, opening each file, iterating, if $hash->{$file}->{$name} isn't defined I define it, otherwise I increment $hash->{$file}->{$name}++. (The other subtle differences are intentional, like how 99 appears on some of the rows, but since it doesn't appear in the mapping of nums to names, I don't include it in the report.)

It doesn't seem like trying to calculate differences inside the same loop I'm using to iterate the files is the way to go.

I only see a couple of possible paths, but I'm can't wrap my head around either of them very well. Should I iterate the resulting %$hash after I finish creating it and make a new hash out of the results? Or, while I populate %$hash, should I also somehow populate an additional hash to make it easy to calculate later?

Any advice appreciated.

Just so to include the obligatory code sample it goes something like this:

my $hash = {};
my @heading = ( 'name' );
my @report;
for my $fn ( sort @filenames ) {

    open my $fh, '<', $fn
        or die "Error opening file ${fn}: $!\n";

    # Read in each filename and populate the hash
    #
    while (<$fh>) {
        chomp;
        s%\r%%;
        my @line = split /\|/;
        my @curnums = split( ',', $line[2] );

        for my $curnum ( @curnums ) {
            next unless $defined $nums_to_names->{$curnum};
            if ( ! defined $hash->{$fn} or ! defined $hash->{$fn}->{$c
+urnum} ) {
                $hash->{$fn}->{$curnum} = 1;
            } else {
                $hash->{$fn}->{$curnum}++;
            }
        }
    }

    # Iterate the mapping of numbers to names
    #
    for my $curnum ( sort keys %$nums_to_names ) {

        my @report_line;

        # Skip it unless it's defined and has a value
        #
        my $name =
            defined $nums_to_names->{$curnum}
            &&      $nums_to_names->{$curnum}
                  ? $nums_to_names->{$curnum}
                  : next
                  ;

        push @report_line, $name;


        # For the current mapping number, pluck the corresponding coun
+ts 
        # related to each file
        #
        for my $curfilename ( sort keys %$hash ) {

             my $count =
                        defined $hash->{$curfilename}->{$curnum}
                        &&      $hash->{$curfilename}->{$curnum}
                      ? commify($hash->{$curfilename}->{$curnum})
                      : 'empty'
                      ;

            push @report_line, $count;

        }

        push @report, \@report_line;

    }

}

push @heading, basename($_) for sort @filenames;
[download]

Then I iterate @heading and @report and print them out cell by cell.

Any tips on how I might add the percentage change between columns?

--
Andy

In reply to Calculating percentage of change between columns by naChoZ

Are you posting in the right place? Check out Where do I post X? to know for sure.
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
Want more info? How to link or How to display code and escape characters are good places to start.


Don't ask to ask, just ask
	PerlMonks