Beefy Boxes and Bandwidth Generously Provided by pair Networks
Welcome to the Monastery
 
PerlMonks  

Re^5: Computing results through Arrays

by aaron_baugher (Curate)
on Jun 05, 2015 at 18:44 UTC ( #1129235=note: print w/replies, xml ) Need Help??


in reply to Re^4: Computing results through Arrays
in thread Computing results through Arrays

You're getting close! The main problem is with your loop logic. You want to print a header line starting with "connectionTime," followed by the database names. You can do that with something like this:

print " collectionTime"; for my $db_keys (sort keys %db){ print " $db_keys"; # adjust spaces to line things up } print "\n";

Now you want to start going through the actual data, printing it so that it lines up with the headers. So this loop follows the previous one, instead of being inside it:

for my $h_keys (sort keys %h){ print $h_keys; # print the date/hour for my $db_keys (sort keys %db){ print " $h{$h_keys}{$db_keys}"; # pad with enough spaces to + match header } print "\n"; # this goes outside the inner loop, to end the line }

I haven't tested that, but it's just a bit of an adjustment to what you had. Once it works, the next thing you'll probably want to look at is replacing the print statements with printf, which will help you line things up in columns even though the values are of different lengths.

One more thought: for efficiency's sake, we should probably sort the %db hash keys once and put them in an array, rather than re-sorting them every time we print a line. But it'll work this way, so we can deal with that next time.

Aaron B.
Available for small or large Perl jobs and *nix system administration; see my home node.

Replies are listed 'Best First'.
Re^6: Computing results through Arrays
by yasser8@gmail.com (Novice) on Jun 05, 2015 at 20:51 UTC

    Thanks a lot for your guidance Sir.Here is the code with latest updates,

    #!/usr/bin/env perl use strict; use warnings; my %h; my %m; my %db; while(<DATA>){ chomp; my @fields = split; my ($date,$database_name,$speed) = @fields[1,2,3]; my ($date_hour,$minute) = split (/:/,$date) ; my $date_hour_minute = join (':',$date_hour,$minute) ; $h{$date_hour}{$database_name} += $speed; $m{$date_hour_minute}{$database_name} += $speed; $db{$database_name} = 1; } print " collectionTime"; for my $db_keys (sort keys %db){ print " $db_keys"; # adjust spaces to line things up } print "\n"; for my $h_keys (sort keys %h){ print $h_keys; # print the date/hour for my $db_keys (sort keys %db){ print " $h{$h_keys}{$db_keys}"; # pad with enough spaces to + match header } print "\n"; # this goes outside the inner loop, to end the line }

    Number of databases at anytime will be less than 10, do you think still sorting the %db hash keys once and putting them in an array will be useful here?

    One more clarification, the total amount of data which needs to be computed will be around 100k records. So do you think this approach will be efficient enough? Please guide me if you think there is room for improving efficiency.

      I can't tell for sure since you didn't show your __DATA__ section, but since it's complaining about not having a string to split after chomp, it's probably running into a blank line in your __DATA__ section. You can test for that before splitting (see below). On your other question, yes, it's worth sorting the database names into an array, not because of how many databases there are, but because you don't want to sort them again for each of 100_000 records, as my pseudo-code did.

      I went ahead and did a working version that pulls in the data and produces the per-hour report like I think you want it. I added a lot of comments, but feel free to ask about anything you don't understand. I think you should be able to add the per-minute section yourself (see the comments for where), based on how the per-hour section works.

      Once you're comfortable with how it works, one way to make the printing nicer would be to calculate the width of each column (for printf) based on the maximum width of the items in that column. I didn't get into that here, to keep it simple.

      #!/usr/bin/env perl use 5.010; use strict; use warnings; my %h; my %m; my %db; # per-hour hash, per-minute hash, database names while(<DATA>){ next unless /\w/; # skip blank line +s my($datetime,$database,$speed) = (split)[1,2,3]; my $ddhhmm = substr $datetime,0,19; # substr works we +ll here since the lengths are static my $ddhh = substr $datetime,0,16; # this one doesn' +t include the minutes $h{$ddhh }{$database} += $speed; # add the speed t +o this hour & database $m{$ddhhmm}{$database} += $speed; # add the speed t +o this minute & database $db{$database} = 1; # save the databa +se name } my @db = sort keys %db; # sort and save database names as array since +we'll be looping through them many times # HOUR SECTION START # print out the per-hour stats # starting with a header line print " collectionTime"; printf "%11s", $_ for (@db); # print each database name as a header +taking 10 spaces print "\n"; # end of line for my $key (sort keys %h){ print $key; # print the date/hour key printf "%11s", $h{$key}{$_} for (@db); # print the value for each + database that goes with this key print "\n"; } # HOUR SECTION END # MINUTE SECTION START (using %m instead of %h) # MINUTE SECTION END __DATA__ server01: 2015-06-01T12:40:03-04:00 DB101 10 MB/sec server01: 2015-06-01T12:40:03-04:00 DB202 5 MB/sec server01: 2015-06-01T12:40:03-04:00 ASM 2 MB/sec server01: 2015-06-01T12:40:03-04:00 MYDB101 2 MB/sec server01: 2015-06-01T12:40:03-04:00 MYDB202 5 MB/sec server01: 2015-06-01T12:40:03-04:00 _OTHER_DB_ 30 MB/sec server01: 2015-06-01T12:41:03-04:00 DB101 3 MB/sec server01: 2015-06-01T12:41:03-04:00 DB202 4 MB/sec server01: 2015-06-01T12:41:03-04:00 ASM 2 MB/sec server01: 2015-06-01T12:41:03-04:00 MYDB101 9 MB/sec server01: 2015-06-01T12:41:03-04:00 MYDB202 7 MB/sec server01: 2015-06-01T12:41:03-04:00 _OTHER_DB_ 50 MB/sec server02: 2015-06-01T12:40:03-04:00 DB101 90 MB/sec server02: 2015-06-01T12:40:03-04:00 DB202 9 MB/sec server02: 2015-06-01T12:40:03-04:00 ASM 2 MB/sec server02: 2015-06-01T12:40:03-04:00 MYDB101 3 MB/sec server02: 2015-06-01T12:40:03-04:00 MYDB202 1 MB/sec server02: 2015-06-01T12:40:03-04:00 _OTHER_DB_ 90 MB/sec server02: 2015-06-01T12:41:03-04:00 DB101 1 MB/sec server02: 2015-06-01T12:41:03-04:00 DB202 4 MB/sec server02: 2015-06-01T12:41:03-04:00 ASM 2 MB/sec server02: 2015-06-01T12:41:03-04:00 MYDB101 7 MB/sec server02: 2015-06-01T12:41:03-04:00 MYDB202 7 MB/sec server02: 2015-06-01T12:41:03-04:00 _OTHER_DB_ 55 MB/sec

      Aaron B.
      Available for small or large Perl jobs and *nix system administration; see my home node.

        Once again thanks a lot for all your help...

        You were right, there were some blanks lines. I did thorough testing by having Day, Hour, Minute grouping, now its working perfectly fine but as you said sometimes printing gets messier when columns width of the items are larger. Is there a way to sort it out when columns width are larger and if they are not then have only one single space between each database name ?

        I always gets curious and motivated by looking at your style of coding, its very elegant and precise. I am very happy to see that my coding has improved a lot by following your style of coding. Thanks a lot Sir!!!

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1129235]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others avoiding work at the Monastery: (6)
As of 2020-06-02 09:47 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    Do you really want to know if there is extraterrestrial life?



    Results (16 votes). Check out past polls.

    Notices?