Beefy Boxes and Bandwidth Generously Provided by pair Networks
Don't ask to ask, just ask
 
PerlMonks  

comment on

( [id://3333]=superdoc: print w/replies, xml ) Need Help??

I'm wondering if there's a simple way I could shoehorn this feature in without a full rewrite. Basically I want to just throw in a quick calculation of changes between columns.

Here's a trimmed down example of what I'm doing:

Read a file of num to name mappings:

num|name 1|foo 12|bar 15|bar 18|baz 25|quux 27|quux 37|quuux 48|quuuux

Read any number of files (granted, it's rarely more than a handful) that look like this:

file1: acct|description|nums foo-001|foo one|1,12,27,37 foo-002|foo two|1,15,25,37 foo-003|foo three|1,18,25,37 foo-004|foo four|12,18,25,37 foo-005|foo five|12,15,25,27,37 foo-006|foo six|1,12,25,27,37,99 file2: acct|description|nums foo-001|foo one|1,12,15,27,37 foo-002|foo two|1,15,25,37 foo-003|foo three|1,18,25 foo-004|foo four|12,18,25,37 foo-005|foo five|12,15,25,27,37 foo-006|foo six|1,12,15,25,27,37,99 file3: acct|description|nums foo-001|foo one|1,12,15,18,27,37 foo-002|foo two|1,15,25,37 foo-003|foo three|1,18,25 foo-004|foo four|12,18,25,37 foo-005|foo five|12,15,25,27,37 foo-006|foo six|1,12,15,25,27,37,99

I iterate and increment values in a hashref as I go so it ends up looking like:

$hash => { file1 => { foo => 4, bar => 6, baz => 4, quux => 8, quuux => 6, quuuux => 0, }, file2 => { foo => 4, bar => 8, baz => 4, quux => 8, quuux => 5, quuuux => 0, }, file3 => { foo => 4, bar => 8, baz => 5, quux => 8, quuux => 5, quuuux => 0, }, }

And then simply I produce a report like this to make it easy to spot the differences:

name file1 file2 file3 foo 4 4 4 bar 6 8 8 baz 4 4 5 quux 8 8 8 quuux 6 5 5 quuuux empty empty empty

What I'd like to do is something like this:

name file1 file2 %change file3 %change foo 4 4 0% 4 0% bar 6 8 133% 8 0% baz 4 4 0% 5 125% quux 8 8 0% 8 0% quuux 6 5 -83% 5 0% quuuux empty empty empty

My code is very straightforward, the only difference being I'm handling a file with 20+ columns and the each file is 300,000+ lines.

I'm populating %$hash exactly as you might expect, opening each file, iterating, if $hash->{$file}->{$name} isn't defined I define it, otherwise I increment $hash->{$file}->{$name}++. (The other subtle differences are intentional, like how 99 appears on some of the rows, but since it doesn't appear in the mapping of nums to names, I don't include it in the report.)

It doesn't seem like trying to calculate differences inside the same loop I'm using to iterate the files is the way to go.

I only see a couple of possible paths, but I'm can't wrap my head around either of them very well. Should I iterate the resulting %$hash after I finish creating it and make a new hash out of the results? Or, while I populate %$hash, should I also somehow populate an additional hash to make it easy to calculate later?

Any advice appreciated.

Just so to include the obligatory code sample it goes something like this:

my $hash = {}; my @heading = ( 'name' ); my @report; for my $fn ( sort @filenames ) { open my $fh, '<', $fn or die "Error opening file ${fn}: $!\n"; # Read in each filename and populate the hash # while (<$fh>) { chomp; s%\r%%; my @line = split /\|/; my @curnums = split( ',', $line[2] ); for my $curnum ( @curnums ) { next unless $defined $nums_to_names->{$curnum}; if ( ! defined $hash->{$fn} or ! defined $hash->{$fn}->{$c +urnum} ) { $hash->{$fn}->{$curnum} = 1; } else { $hash->{$fn}->{$curnum}++; } } } # Iterate the mapping of numbers to names # for my $curnum ( sort keys %$nums_to_names ) { my @report_line; # Skip it unless it's defined and has a value # my $name = defined $nums_to_names->{$curnum} && $nums_to_names->{$curnum} ? $nums_to_names->{$curnum} : next ; push @report_line, $name; # For the current mapping number, pluck the corresponding coun +ts # related to each file # for my $curfilename ( sort keys %$hash ) { my $count = defined $hash->{$curfilename}->{$curnum} && $hash->{$curfilename}->{$curnum} ? commify($hash->{$curfilename}->{$curnum}) : 'empty' ; push @report_line, $count; } push @report, \@report_line; } } push @heading, basename($_) for sort @filenames;
Then I iterate @heading and @report and print them out cell by cell.

Any tips on how I might add the percentage change between columns?

--
Andy


In reply to Calculating percentage of change between columns by naChoZ

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Are you posting in the right place? Check out Where do I post X? to know for sure.
  • Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
    <code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
  • Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
  • Want more info? How to link or How to display code and escape characters are good places to start.
Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others examining the Monastery: (2)
As of 2024-04-24 23:52 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found