Computing results through Arrays

yasser8@gmail.com has asked for the wisdom of the Perl Monks concerning the following question:

Need help in computing and reporting data, which is collected every minute for each database. First column is Server name, Second column is Time having format(+%Y-%m-%dT%H:%M:%S%z), Third column is Database name and Last column is amount of MB/sec performed by each database.

Snippet of data:

server01: 2015-06-01T12:40:03-04:00  DB101                  10 MB/sec
server01: 2015-06-01T12:40:03-04:00  DB202                   5 MB/sec
server01: 2015-06-01T12:40:03-04:00  ASM                     2 MB/sec
server01: 2015-06-01T12:40:03-04:00  MYDB101                 2 MB/sec
server01: 2015-06-01T12:40:03-04:00  MYDB202                 5 MB/sec
server01: 2015-06-01T12:40:03-04:00  _OTHER_DB_             30 MB/sec
server01: 2015-06-01T12:41:03-04:00  DB101                   3 MB/sec
server01: 2015-06-01T12:41:03-04:00  DB202                   4 MB/sec
server01: 2015-06-01T12:41:03-04:00  ASM                     2 MB/sec
server01: 2015-06-01T12:41:03-04:00  MYDB101                 9 MB/sec
server01: 2015-06-01T12:41:03-04:00  MYDB202                 7 MB/sec
server01: 2015-06-01T12:41:03-04:00  _OTHER_DB_             50 MB/sec
server02: 2015-06-01T12:40:03-04:00  DB101                  90 MB/sec
server02: 2015-06-01T12:40:03-04:00  DB202                   9 MB/sec
server02: 2015-06-01T12:40:03-04:00  ASM                     2 MB/sec
server02: 2015-06-01T12:40:03-04:00  MYDB101                 3 MB/sec
server02: 2015-06-01T12:40:03-04:00  MYDB202                 1 MB/sec
server02: 2015-06-01T12:40:03-04:00  _OTHER_DB_             90 MB/sec
server02: 2015-06-01T12:41:03-04:00  DB101                   1 MB/sec
server02: 2015-06-01T12:41:03-04:00  DB202                   4 MB/sec
server02: 2015-06-01T12:41:03-04:00  ASM                     2 MB/sec
server02: 2015-06-01T12:41:03-04:00  MYDB101                 7 MB/sec
server02: 2015-06-01T12:41:03-04:00  MYDB202                 7 MB/sec
server02: 2015-06-01T12:41:03-04:00  _OTHER_DB_             55 MB/sec
[download]

And I want to report it in two ways as shown below, please note that I am not bothered about grouping at Server names, Its just grouping MB/sec for each Database name and I want to do it at Minute and Hour level.

Sample of Required output:

Frequency Minute:
           collectionTime DB101 DB202 ASM MYDB101 MYDB202 _OTHER_DB_
2015-06-01T12:40:03-04:00   100    14   4       5       6        140  
2015-06-01T12:41:03-04:00     4     8   4      16      14        105
[download]

Frequency Hour:
           collectionTime DB101 DB202 ASM MYDB101 MYDB202 _OTHER_DB_
2015-06-01T12:00:00-04:00     
2015-06-01T13:00:00-04:00
[download]

Please help me in this requirement, I am still learning perl programming, would be thankful if someone can help me here

Comment on Computing results through Arrays Select or Download Code

Replies are listed 'Best First'.
Re: Computing results through Arrays by Anonymous Monk on Jun 05, 2015 at 08:42 UTC
Sure see perlintro, Files and I/O, Parentheses for capturing and http://perldoc.perl.org/perlrequick.html#Extracting-matches, open, split...	[reply]
Re^2: Computing results through Arrays by yasser8@gmail.com (Novice) on Jun 05, 2015 at 09:13 UTC
After some search, I found elegant solution from Aaron, but still not able to meet my requirement. #!/usr/bin/env perl use strict; use warnings; my $key = shift; my @cols = @ARGV; my %h; die "Usage: grp.pl id_column field1 [field2]...\n" unless @cols; # $ARGV[0] is the Base column # $ARGV[1..x] is the list of columns to add up while(<DATA>){ chomp; my @f = split; for (@cols){ $h{$f[$key]}{$_}{t} += $f[$_]; $h{$f[$key]}{$_}{n}++; } } for my $k (sort keys %h){ print $k; print "\t$h{$k}{$_}{n}\t$h{$k}{$_}{t}" for @cols; print "\n"; } __DATA__ server01: 2015-06-01T12:40:03-04:00 DB101 10 MB/sec server01: 2015-06-01T12:40:03-04:00 DB202 5 MB/sec server01: 2015-06-01T12:40:03-04:00 ASM 2 MB/sec server01: 2015-06-01T12:40:03-04:00 MYDB101 2 MB/sec server01: 2015-06-01T12:40:03-04:00 MYDB202 5 MB/sec server01: 2015-06-01T12:40:03-04:00 _OTHER_DB_ 30 MB/sec server01: 2015-06-01T12:41:03-04:00 DB101 3 MB/sec server01: 2015-06-01T12:41:03-04:00 DB202 4 MB/sec server01: 2015-06-01T12:41:03-04:00 ASM 2 MB/sec server01: 2015-06-01T12:41:03-04:00 MYDB101 9 MB/sec server01: 2015-06-01T12:41:03-04:00 MYDB202 7 MB/sec server01: 2015-06-01T12:41:03-04:00 _OTHER_DB_ 50 MB/sec server02: 2015-06-01T12:40:03-04:00 DB101 90 MB/sec server02: 2015-06-01T12:40:03-04:00 DB202 9 MB/sec server02: 2015-06-01T12:40:03-04:00 ASM 2 MB/sec server02: 2015-06-01T12:40:03-04:00 MYDB101 3 MB/sec server02: 2015-06-01T12:40:03-04:00 MYDB202 1 MB/sec server02: 2015-06-01T12:40:03-04:00 _OTHER_DB_ 90 MB/sec server02: 2015-06-01T12:41:03-04:00 DB101 1 MB/sec server02: 2015-06-01T12:41:03-04:00 DB202 4 MB/sec server02: 2015-06-01T12:41:03-04:00 ASM 2 MB/sec server02: 2015-06-01T12:41:03-04:00 MYDB101 7 MB/sec server02: 2015-06-01T12:41:03-04:00 MYDB202 7 MB/sec server02: 2015-06-01T12:41:03-04:00 _OTHER_DB_ 55 MB/sec [download] `Got below result: ./grp.pl 2 3 ASM 4 8 DB101 4 104 DB202 4 22 MYDB101 4 21 MYDB202 4 20 _OTHER_DB_ 4 225` [download] Could you please help in finding a way to group on time column based on Hour and print as per my requirement. Sample of Required output: `Frequency Minute: collectionTime DB101 DB202 ASM MYDB101 MYDB202 _OTHER_DB_ 2015-06-01T12:40:03-04:00 100 14 4 5 6 140 2015-06-01T12:41:03-04:00 4 8 4 16 14 105` [download] `Frequency Hour: collectionTime DB101 DB202 ASM MYDB101 MYDB202 _OTHER_DB_ 2015-06-01T12:00:00-04:00 2015-06-01T13:00:00-04:00` [download]	[reply] [d/l] [select]
Re^3: Computing results through Arrays by aaron_baugher (Curate) on Jun 05, 2015 at 12:34 UTC
As Laurent_R said, your requirements have another dimension (or two), so my script won't work for this except for some of the basic ideas. You're probably going to want three hashes, one to collect per-hour values (%h) and one to collect per-minute value (%m), and one to collect database names (%db). Then you'll need to: for each line parse out the date-hour, date-hour-minute, database name, and speed add speed to $h{date-hour}{database name} add speed to $m{date-hour-minute}{database name} $db{database name} = 1 # put database name in hash loop through sorted keys of %db print them as headers, formatted to fit what's coming below loop through keys of %h (sorted if you want) print the key (the date-hour) loop through sorted keys of %db print $h{$key}{database name} print a newline now do the same with the per-minute hash %m [download] Try coding that, and let us know if you need help. Aaron B. Available for small or large Perl jobs and *nix system administration; see my home node.	[reply] [d/l]
Re^4: Computing results through Arrays by yasser8@gmail.com (Novice) on Jun 05, 2015 at 17:33 UTC
Re^5: Computing results through Arrays by aaron_baugher (Curate) on Jun 05, 2015 at 18:44 UTC
Some notes below your chosen depth have not been shown here
Re^3: Computing results through Arrays by Laurent_R (Canon) on Jun 05, 2015 at 11:00 UTC
Hi, your data structure is obviously too simplified for the level of details that you want to display, your `%h` hash needs an extra level of information (the time). Once you have added that, it should be only a matter of displaying correctly.	[reply] [d/l]
Re: Computing results through Arrays by GotToBTru (Prior) on Jun 05, 2015 at 12:51 UTC
The need to present the data using different internal organizations screams DATABASE to me. See this tutorial here at PM; Google will find other examples. Dum Spiro Spero	[reply]
Re^2: Computing results through Arrays by aaron_baugher (Curate) on Jun 05, 2015 at 13:18 UTC
I agree that a database could be a good way to store this data, especially if it's going to be mined in various ways. However, if I'm given this file as-is and told to produce the reports specified by the OP, I'm not going to go to the trouble to put it in a database so that I can run SQL statements on it. I can write a script to get it directly in far less time, so a database will only become an advantage if I need to pull reports from it in several different ways. (And I already know SQL; the newbie would have to learn a second language first.) Aaron B. Available for small or large Perl jobs and *nix system administration; see my home node.	[reply]
Re^3: Computing results through Arrays by Laurent_R (Canon) on Jun 05, 2015 at 13:41 UTC
aaron_baugher: I agree that a database could be a good way to store this data, especially if it's going to be mined in various ways. However, if I'm given this file as-is and told to produce the reports specified by the OP, I'm not going to go to the trouble to put it in a database so that I can run SQL statements on it. I can write a script to get it directly in far less time, so a database will only become an advantage if I need to pull reports from it in several different ways. I agree with every single word of what you say, Aaron. To me, using a database is overkill for such a simple problem. I would certainly not do that unless there are some other very good reasons to do it.	[reply]
Re^3: Computing results through Arrays by GotToBTru (Prior) on Jun 05, 2015 at 13:24 UTC
Yeah. That's why my reply is a good third reply. As long as the scope of the problem stays this small and this specific, no need to bring out the big guns. In case it won't ... the sooner he learns databases and SQL, the better. Dum Spiro Spero	[reply]
Re^4: Computing results through Arrays by aaron_baugher (Curate) on Jun 05, 2015 at 14:17 UTC
Re^2: Computing results through Arrays by yasser8@gmail.com (Novice) on Jun 05, 2015 at 17:39 UTC
Actually I am working DBA for more than 12 Years, I can meet this requirement in no time if this was in SQL. But unfortunately these data are pulled from Oracle Exadata Storage servers and I don't want to store it in Production databases which are purely meant for storing Customer Data. This is why I am striving very hard to learn perl from experts like you guys...one good thing I notice in perl world is, perl experts are very helpful in nature. Thanks a lot for helping beginners like me !!!	[reply]
Re^3: Computing results through Arrays by afoken (Chancellor) on Jun 06, 2015 at 10:40 UTC
these data are pulled from Oracle Exadata Storage servers and I don't want to store it in Production databases which are purely meant for storing Customer Data. Use SQLite or a local PostgreSQL. Both are free as in speech and free as in beer, and both are supported by DBI: DBD::SQLite, DBD::Pg. Note that DBD::SQLite already includes the database engine (SQLite), so after `cpan DBD::SQLite`, you can start using SQLite without installing anything else. Alexander -- Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)	[reply] [d/l]


The stupid question is the question not asked
	PerlMonks