http://qs321.pair.com?node_id=1129192


in reply to Re^2: Computing results through Arrays
in thread Computing results through Arrays

As Laurent_R said, your requirements have another dimension (or two), so my script won't work for this except for some of the basic ideas. You're probably going to want three hashes, one to collect per-hour values (%h) and one to collect per-minute value (%m), and one to collect database names (%db). Then you'll need to:

for each line parse out the date-hour, date-hour-minute, database name, and speed add speed to $h{date-hour}{database name} add speed to $m{date-hour-minute}{database name} $db{database name} = 1 # put database name in hash loop through sorted keys of %db print them as headers, formatted to fit what's coming below loop through keys of %h (sorted if you want) print the key (the date-hour) loop through sorted keys of %db print $h{$key}{database name} print a newline now do the same with the per-minute hash %m

Try coding that, and let us know if you need help.

Aaron B.
Available for small or large Perl jobs and *nix system administration; see my home node.

Replies are listed 'Best First'.
Re^4: Computing results through Arrays
by yasser8@gmail.com (Novice) on Jun 05, 2015 at 17:33 UTC

    Thanks a lot Aaron Sir !!!! I tried coding the way you said but not sure where I am going wrong, getting lot of errors "Use of uninitialized value in string". I tried debugging but in vain. Also no idea how to print the keys in a single line, I am not able to meet this requirement "print them as headers, formatted to fit what's coming below" Please do not mind for these silly mistakes, I am still beginner in perl.

    #!/usr/bin/env perl use strict; use warnings; my %h; my %m; my %db; while(<DATA>){ chomp; my @fields = split; my ($date,$database_name,$speed) = @fields[1,2,3]; my ($date_hour,$minute) = split /:/, $date ; my $date_hour_minute = join (':',$date_hour,$minute) ; $h{$date_hour}{$database_name} += $speed; $m{$date_hour_minute}{$database_name} += $speed; $db{$database_name} = 1; } for my $db_keys (sort keys %db){ print "$db_keys"; for my $h_keys (sort keys %h){ print $h_keys; for my $db_keys (sort keys %db){ print "$h{$h_keys}{$db_keys}"; print "\n"; } } }

    Will be thankful to you if you could help me please...

      You're getting close! The main problem is with your loop logic. You want to print a header line starting with "connectionTime," followed by the database names. You can do that with something like this:

      print " collectionTime"; for my $db_keys (sort keys %db){ print " $db_keys"; # adjust spaces to line things up } print "\n";

      Now you want to start going through the actual data, printing it so that it lines up with the headers. So this loop follows the previous one, instead of being inside it:

      for my $h_keys (sort keys %h){ print $h_keys; # print the date/hour for my $db_keys (sort keys %db){ print " $h{$h_keys}{$db_keys}"; # pad with enough spaces to + match header } print "\n"; # this goes outside the inner loop, to end the line }

      I haven't tested that, but it's just a bit of an adjustment to what you had. Once it works, the next thing you'll probably want to look at is replacing the print statements with printf, which will help you line things up in columns even though the values are of different lengths.

      One more thought: for efficiency's sake, we should probably sort the %db hash keys once and put them in an array, rather than re-sorting them every time we print a line. But it'll work this way, so we can deal with that next time.

      Aaron B.
      Available for small or large Perl jobs and *nix system administration; see my home node.

        Thanks a lot for your guidance Sir.Here is the code with latest updates,

        #!/usr/bin/env perl use strict; use warnings; my %h; my %m; my %db; while(<DATA>){ chomp; my @fields = split; my ($date,$database_name,$speed) = @fields[1,2,3]; my ($date_hour,$minute) = split (/:/,$date) ; my $date_hour_minute = join (':',$date_hour,$minute) ; $h{$date_hour}{$database_name} += $speed; $m{$date_hour_minute}{$database_name} += $speed; $db{$database_name} = 1; } print " collectionTime"; for my $db_keys (sort keys %db){ print " $db_keys"; # adjust spaces to line things up } print "\n"; for my $h_keys (sort keys %h){ print $h_keys; # print the date/hour for my $db_keys (sort keys %db){ print " $h{$h_keys}{$db_keys}"; # pad with enough spaces to + match header } print "\n"; # this goes outside the inner loop, to end the line }

        Number of databases at anytime will be less than 10, do you think still sorting the %db hash keys once and putting them in an array will be useful here?

        One more clarification, the total amount of data which needs to be computed will be around 100k records. So do you think this approach will be efficient enough? Please guide me if you think there is room for improving efficiency.