Beefy Boxes and Bandwidth Generously Provided by pair Networks
Just another Perl shrine
 
PerlMonks  

Re^9: Computing results through Arrays

by aaron_baugher (Curate)
on Jun 07, 2015 at 14:37 UTC ( #1129347=note: print w/replies, xml ) Need Help??


in reply to Re^8: Computing results through Arrays
in thread Computing results through Arrays

Thank you, I'm glad we were able to get it working for you. I've learned a lot about style from this site too. By the way, I noticed a mistake: I was using too long a length in my substr() calls, so I was treating seconds as minutes and minutes as hours. I've corrected that in the version below.

On making the column widths dynamic, think about what you'll need to do. After getting all the values into your hash tables, you'll need to loop through them by column, finding the length of each value and saving the largest length found somewhere, matched to that column. We can save them as the values of our %db hash, since we just had '1' placeholders there before. So the keys of %db will still be the database names (and column headers), but later the values will become the column widths.

This is a bit complicated, and we need to do it twice, once for the hour report and once for the minute report, so I'll make it a subroutine. I pass the main hash (%h or %m) and the %db hash to it as references. I also pass a ref to the @db array of database names, so the subroutine doesn't have to re-get the keys from %db. Since the top-level keys of the hash are the datetimes, I have to loop through those first, then inside that I loop through the database names, decide which length is longer -- the one already saved for that column, or the length of the current item -- and save that in the database name hash. When it's finished, my main program can get the lengths for each column from the values in %db.

sub set_column_widths { my $h = shift; # reference to hash table of speeds, keyed +by datetime, then by database my $databases = shift; # reference to hash of database names, wher +e we will set the width values my $names = shift; # ref to array of database names # (so we don't need to call keys on $da +tabases repeatedly) for my $key (keys %$h){ for my $db (@$names){ my $l = length $h->{$key}{$db}; # get length of this item + in the hash table # set column width to the + widest length $databases->{$db} = $databases->{$db} > $l ? $databases->{ +$db} : $l; } } for my $db (@$names){ # check the width of the database names the +mselves too my $l = length $db; $databases->{$db} = $databases->{$db} > $l ? $databases->{$db} + : $l; } }

Now I just need to call that before printing each report, like this:

set_column_widths(\%h, \%db, \@db);

Now when it's time to print out the columns, we can get the width and use it in the printf() statements where I previously hard-coded 11. For instance, in this line which prints the database names:

printf "%11s", $_ for (@db); # replace 11 with $db{$_}, the saved width for this column, and # stick a space between columns printf " %$db{$_}s", $_ for (@db);

You'll need to make the same change in the other printf() statement, and then you should have dynamic-width columns. Here's the full script with these changes, in case it's not clear where I made them, plus the fixes for my substr() length mistake:

#!/usr/bin/env perl use 5.010; use strict; use warnings; my %h; my %m; my %db; # per-hour hash, per-minute hash, database names sub set_column_widths { my $h = shift; # reference to hash table of speeds, keyed +by datetime, then by database my $databases = shift; # reference to hash of database names, wher +e we will set the width values my $names = shift; # ref to array of database names # (so we don't need to call keys on $da +tabases repeatedly) for my $key (keys %$h){ for my $db (@$names){ my $l = length $h->{$key}{$db}; # get length of this item + in the hash table # set column width to the + wider length $databases->{$db} = $databases->{$db} > $l ? $databases->{ +$db} : $l; } } for my $db (@$names){ # check the width of the database names the +mselves too my $l = length $db; $databases->{$db} = $databases->{$db} > $l ? $databases->{$db} + : $l; } } while(<DATA>){ next unless /\w/; # skip blank line +s my($datetime,$database,$speed) = (split)[1,2,3]; my $ddhhmm = substr $datetime,0,16; # substr works we +ll here since the lengths are static my $ddhh = substr $datetime,0,13; # this one doesn' +t include the minutes $h{$ddhh }{$database} += $speed; # add the speed t +o this hour & database $m{$ddhhmm}{$database} += $speed; # add the speed t +o this minute & database $db{$database} = 1; # save the databa +se name } my @db = sort keys %db; # sort and save database names as array since +we'll be looping through them many times # HOUR SECTION START # calculate column widths set_column_widths(\%h, \%db, \@db); # print out the per-hour stats # starting with a header line print "Frequency Hour:\ncollectionTime"; printf " %$db{$_}s", $_ for (@db); # print each database name as a h +eader with dynamic width print "\n"; # end of line for my $key (sort keys %h){ print "$key "; # print the date/hou +r key printf " %$db{$_}s", $h{$key}{$_} for (@db); # print the value fo +r each database that goes with this key print "\n"; } # HOUR SECTION END # MINUTE SECTION START (using %m instead of %h) # calculate column widths set_column_widths(\%m, \%db, \@db); # print out the per-minute stats # starting with a header line print "\nFrequency Minute:\n collectionTime"; printf " %$db{$_}s", $_ for (@db); # print each database name as a h +eader with dynamic width print "\n"; # end of line for my $key (sort keys %m){ print $key; # print the date/hou +r/minute key printf " %$db{$_}s", $m{$key}{$_} for (@db); # print the value fo +r each database that goes with this key print "\n"; } # MINUTE SECTION END __DATA__ server01: 2015-06-01T12:40:03-04:00 DB101 10 MB/sec server01: 2015-06-01T12:40:03-04:00 DB202 5 MB/sec server01: 2015-06-01T12:40:03-04:00 ASM 2 MB/sec server01: 2015-06-01T12:40:03-04:00 MYDB101 2 MB/sec server01: 2015-06-01T12:40:03-04:00 MYDB202 5 MB/sec server01: 2015-06-01T12:40:03-04:00 _OTHER_DB_ 30 MB/sec server01: 2015-06-01T12:41:03-04:00 DB101 3 MB/sec server01: 2015-06-01T12:41:03-04:00 DB202 4 MB/sec server01: 2015-06-01T12:41:03-04:00 ASM 2 MB/sec server01: 2015-06-01T12:41:03-04:00 MYDB101 9 MB/sec server01: 2015-06-01T12:41:03-04:00 MYDB202 7 MB/sec server01: 2015-06-01T12:41:03-04:00 _OTHER_DB_ 50 MB/sec server02: 2015-06-01T12:40:03-04:00 DB101 90 MB/sec server02: 2015-06-01T12:40:03-04:00 DB202 9 MB/sec server02: 2015-06-01T12:40:03-04:00 ASM 2 MB/sec server02: 2015-06-01T12:40:03-04:00 MYDB101 3 MB/sec server02: 2015-06-01T12:40:03-04:00 MYDB202 1 MB/sec server02: 2015-06-01T12:40:03-04:00 _OTHER_DB_ 90 MB/sec server02: 2015-06-01T12:41:03-04:00 DB101 1 MB/sec server02: 2015-06-01T12:41:03-04:00 DB202 4 MB/sec server02: 2015-06-01T12:41:03-04:00 ASM 2 MB/sec server02: 2015-06-01T12:41:03-04:00 MYDB101 7 MB/sec server02: 2015-06-01T12:41:03-04:00 MYDB202 7 MB/sec server02: 2015-06-01T12:41:03-04:00 _OTHER_DB_ 55 MB/sec

Aaron B.
Available for small or large Perl jobs and *nix system administration; see my home node.

Replies are listed 'Best First'.
Re^10: Computing results through Arrays
by yasser8@gmail.com (Novice) on Jun 24, 2015 at 09:49 UTC

    Sorry for visiting back on this Aaron Sir.

    Actually I need to calculate AVERAGE and MAXIMUM within each group the same way we did for SUM earlier. I was able to draft AVERAGE logic and it works fine, but I am not able to derive MAXIMUM. Could you please help me on this and guide me please if my approach for AVERAGE can be written in still more efficient way.

    AVERAGE logic :- SUM($speed)/ DISTINCT($servers) * 60 for Hour

    MAXIMUM logic :- MAX($speed) across all the $servers within each Hour

    #!/usr/bin/env perl use strict; use warnings; my %h; my %m; my %db; my %sr; sub round { $_[0] > 0 ? int($_[0] + .5) : -int(-$_[0] + .5) } sub fnd_max (\%) { my $hash = shift; my ($key, @keys) = keys %$hash; my ($big, @vals) = values %$hash; for (0 .. $#keys) { if ($vals[$_] > $big) { $big = $vals[$_]; $key = $keys[$_]; } } $big } sub set_column_widths { my $h = shift; my $databases = shift; my $names = shift; for my $key (keys %$h){ for my $db (@$names){ my $l = length $h->{$key}{$db}; $databases->{$db} = $databases->{$db} > $l ? $databases->{ +$db} : $l; } } for my $db (@$names){ # check the width of the database names the +mselves too my $l = length $db; $databases->{$db} = $databases->{$db} > $l ? $databases->{$db} + : $l; } } while(<DATA>){ next unless /\w/; my($server,$datetime,$database,$speed) = (split)[0,1,2,3]; my $ddhhmm = substr $datetime,0,16; my $ddhh = substr $datetime,0,13; $h{$ddhh }{$database} += $speed; $m{$ddhhmm}{$database} += $speed; $db{$database} = 1; $sr{$server } = 1; } my @db = sort keys %db; # sort and save database names as array since +we'll be looping through them many times my $count = keys %sr; # HOUR SECTION START - AVG for my $key (sort keys %h){ for (@db) { $h{$key}{$_} = round($h{$key}{$_} / ($count * 60))} ; } set_column_widths(\%h, \%db, \@db); print "Frequency Hour:\ncollectionTime"; printf " %$db{$_}s", $_ for (@db); print "\n"; for my $key (sort keys %h){ print "$key "; printf " %$db{$_}s", $h{$key}{$_} for (@db); print "\n"; } # HOUR SECTION END - AVG # MINUTE SECTION START - AVG for my $key (sort keys %m){ for (@db) { $m{$key}{$_} = round($m{$key}{$_} / ($count))} ; } set_column_widths(\%m, \%db, \@db); print "\nFrequency Minute:\n collectionTime"; printf " %$db{$_}s", $_ for (@db); print "\n"; for my $key (sort keys %m){ print $key; printf " %$db{$_}s", $m{$key}{$_} for (@db); print "\n"; } # MINUTE SECTION END - AVG # HOUR SECTION START - MAX set_column_widths(\%h, \%db, \@db); print "Frequency Hour:\ncollectionTime"; printf " %$db{$_}s", $_ for (@db); print "\n"; for my $key (sort keys %h){ print "$key "; printf " %$db{$_}s", max (values $h{$key}{$_}) for (@db); print "\n"; } # HOUR SECTION END - MAX # MINUTE SECTION START - MAX print fnd_max %m ; set_column_widths(\%m, \%db, \@db); print "\nFrequency Minute:\n collectionTime"; printf " %$db{$_}s", $_ for (@db); print "\n"; for my $key (sort keys %m){ print $key; printf " %$db{$_}s", max (values $m{$key}{$_}) for (@db); print "\n"; } # MINUTE SECTION END - MAX

      robby_dobby already pointed out the actual mistake: you need to pass fnd_max() a reference to the hash, since that's what fnd_max() is expecting. See how I pass my hashes to set_column widths().

      Now beyond that: First, I'd use a more descriptive subroutine name, like "max_value_of_hash", and drop the prototype. Prototypes are advanced juju and shouldn't be used most of the time. Second, if you want to get the largest value from a hash, you don't need to access the keys at all. Here are some examples, starting with the simplest and wordiest:

      #!/usr/bin/env perl use 5.010; use strict; use warnings; # newbie but clean version sub max_value_of_hash { my $h = shift; my $max = 0; for my $v (values %$h){ if ($v > $max){ $max = $v; # keep setting $max to larger value } } return $max; } # more perlish and elegant version sub max_value_of_hash2 { my $h = shift; my $max = 0; $_ > $max ? $max = $_ : undef for values %$h; return $max; } # let a module do it sub max_value_of_hash3 { use List::Util qw(max); return max values %{$_[0]}; } my %hash = ( a => 1, b => 2, c => 5, d => 3 ); say max_value_of_hash( \%hash); say max_value_of_hash2(\%hash); say max_value_of_hash3(\%hash);

      Aaron B.
      Available for small or large Perl jobs and *nix system administration; see my home node.

        Sorry I was not clear with my requirement, also I made worst mistake in my code.

        My requirement is to print the Maximum value the same way Average value printing works. So the assignment of values to hash array keys should be max value instead of sum of the values in that group.

        while(<DATA>){ next unless /\w/; my($server,$datetime,$database,$speed) = (split)[0,1,2,3]; my $ddhhmm = substr $datetime,0,16; my $ddhh = substr $datetime,0,13; $h{$ddhh }{$database} += $speed; $m{$ddhhmm}{$database} += $speed; $db{$database} = 1; $sr{$server } = 1; }

        Is there a way to assign maximum value in this while loop shown above to $h{$ddhh }{$database} and $m{$ddhhmm}{$database} ??

        If this is possible then I can follow the same procedure to print Max values the same way we did for Average.

        One more doubt... All I did to find Average is to loop through the Hash Array one more time and assign the Average value to it as shown below

        for my $key (sort keys %h){ for (@db) { $h{$key}{$_} = round($h{$key}{$_} / ($count * 60))} ; }

        Can it be done efficiently without looping the Hash Array again ? I mean can this be done in while loop itself ?

      Hello yasser8@gmail.com,

      The problem here is in this line:

      print fnd_max %m

      You see, the fnd_max sub requires a hashref, while you're passing a full hash into it. Also, why are you using prototypes for your subroutine? Unless you're using v5.20, where subroutine signatures were introduced, don't use them. To correct your problem, change the %m to \%m - make it clear that it's a sub call, print fnd_max(\%m). Not really important, this is just my convention.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1129347]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others musing on the Monastery: (7)
As of 2020-05-29 17:05 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    If programming languages were movie genres, Perl would be:















    Results (170 votes). Check out past polls.

    Notices?