Beefy Boxes and Bandwidth Generously Provided by pair Networks
The stupid question is the question not asked
 
PerlMonks  

Computing results through Arrays

by yasser8@gmail.com (Novice)
on Jun 05, 2015 at 08:19 UTC ( [id://1129170]=perlquestion: print w/replies, xml ) Need Help??

yasser8@gmail.com has asked for the wisdom of the Perl Monks concerning the following question:

Need help in computing and reporting data, which is collected every minute for each database. First column is Server name, Second column is Time having format(+%Y-%m-%dT%H:%M:%S%z), Third column is Database name and Last column is amount of MB/sec performed by each database.

Snippet of data:

server01: 2015-06-01T12:40:03-04:00 DB101 10 MB/sec server01: 2015-06-01T12:40:03-04:00 DB202 5 MB/sec server01: 2015-06-01T12:40:03-04:00 ASM 2 MB/sec server01: 2015-06-01T12:40:03-04:00 MYDB101 2 MB/sec server01: 2015-06-01T12:40:03-04:00 MYDB202 5 MB/sec server01: 2015-06-01T12:40:03-04:00 _OTHER_DB_ 30 MB/sec server01: 2015-06-01T12:41:03-04:00 DB101 3 MB/sec server01: 2015-06-01T12:41:03-04:00 DB202 4 MB/sec server01: 2015-06-01T12:41:03-04:00 ASM 2 MB/sec server01: 2015-06-01T12:41:03-04:00 MYDB101 9 MB/sec server01: 2015-06-01T12:41:03-04:00 MYDB202 7 MB/sec server01: 2015-06-01T12:41:03-04:00 _OTHER_DB_ 50 MB/sec server02: 2015-06-01T12:40:03-04:00 DB101 90 MB/sec server02: 2015-06-01T12:40:03-04:00 DB202 9 MB/sec server02: 2015-06-01T12:40:03-04:00 ASM 2 MB/sec server02: 2015-06-01T12:40:03-04:00 MYDB101 3 MB/sec server02: 2015-06-01T12:40:03-04:00 MYDB202 1 MB/sec server02: 2015-06-01T12:40:03-04:00 _OTHER_DB_ 90 MB/sec server02: 2015-06-01T12:41:03-04:00 DB101 1 MB/sec server02: 2015-06-01T12:41:03-04:00 DB202 4 MB/sec server02: 2015-06-01T12:41:03-04:00 ASM 2 MB/sec server02: 2015-06-01T12:41:03-04:00 MYDB101 7 MB/sec server02: 2015-06-01T12:41:03-04:00 MYDB202 7 MB/sec server02: 2015-06-01T12:41:03-04:00 _OTHER_DB_ 55 MB/sec

And I want to report it in two ways as shown below, please note that I am not bothered about grouping at Server names, Its just grouping MB/sec for each Database name and I want to do it at Minute and Hour level.

Sample of Required output:

Frequency Minute: collectionTime DB101 DB202 ASM MYDB101 MYDB202 _OTHER_DB_ 2015-06-01T12:40:03-04:00 100 14 4 5 6 140 2015-06-01T12:41:03-04:00 4 8 4 16 14 105
Frequency Hour: collectionTime DB101 DB202 ASM MYDB101 MYDB202 _OTHER_DB_ 2015-06-01T12:00:00-04:00 2015-06-01T13:00:00-04:00

Please help me in this requirement, I am still learning perl programming, would be thankful if someone can help me here

Replies are listed 'Best First'.
Re: Computing results through Arrays
by Anonymous Monk on Jun 05, 2015 at 08:42 UTC

      After some search, I found elegant solution from Aaron, but still not able to meet my requirement.

      #!/usr/bin/env perl use strict; use warnings; my $key = shift; my @cols = @ARGV; my %h; die "Usage: grp.pl id_column field1 [field2]...\n" unless @cols; # $ARGV[0] is the Base column # $ARGV[1..x] is the list of columns to add up while(<DATA>){ chomp; my @f = split; for (@cols){ $h{$f[$key]}{$_}{t} += $f[$_]; $h{$f[$key]}{$_}{n}++; } } for my $k (sort keys %h){ print $k; print "\t$h{$k}{$_}{n}\t$h{$k}{$_}{t}" for @cols; print "\n"; } __DATA__ server01: 2015-06-01T12:40:03-04:00 DB101 10 MB/sec server01: 2015-06-01T12:40:03-04:00 DB202 5 MB/sec server01: 2015-06-01T12:40:03-04:00 ASM 2 MB/sec server01: 2015-06-01T12:40:03-04:00 MYDB101 2 MB/sec server01: 2015-06-01T12:40:03-04:00 MYDB202 5 MB/sec server01: 2015-06-01T12:40:03-04:00 _OTHER_DB_ 30 MB/sec server01: 2015-06-01T12:41:03-04:00 DB101 3 MB/sec server01: 2015-06-01T12:41:03-04:00 DB202 4 MB/sec server01: 2015-06-01T12:41:03-04:00 ASM 2 MB/sec server01: 2015-06-01T12:41:03-04:00 MYDB101 9 MB/sec server01: 2015-06-01T12:41:03-04:00 MYDB202 7 MB/sec server01: 2015-06-01T12:41:03-04:00 _OTHER_DB_ 50 MB/sec server02: 2015-06-01T12:40:03-04:00 DB101 90 MB/sec server02: 2015-06-01T12:40:03-04:00 DB202 9 MB/sec server02: 2015-06-01T12:40:03-04:00 ASM 2 MB/sec server02: 2015-06-01T12:40:03-04:00 MYDB101 3 MB/sec server02: 2015-06-01T12:40:03-04:00 MYDB202 1 MB/sec server02: 2015-06-01T12:40:03-04:00 _OTHER_DB_ 90 MB/sec server02: 2015-06-01T12:41:03-04:00 DB101 1 MB/sec server02: 2015-06-01T12:41:03-04:00 DB202 4 MB/sec server02: 2015-06-01T12:41:03-04:00 ASM 2 MB/sec server02: 2015-06-01T12:41:03-04:00 MYDB101 7 MB/sec server02: 2015-06-01T12:41:03-04:00 MYDB202 7 MB/sec server02: 2015-06-01T12:41:03-04:00 _OTHER_DB_ 55 MB/sec
      Got below result: ./grp.pl 2 3 ASM 4 8 DB101 4 104 DB202 4 22 MYDB101 4 21 MYDB202 4 20 _OTHER_DB_ 4 225

      Could you please help in finding a way to group on time column based on Hour and print as per my requirement.

      Sample of Required output:

      Frequency Minute: collectionTime DB101 DB202 ASM MYDB101 MYDB202 _OTHER_DB_ 2015-06-01T12:40:03-04:00 100 14 4 5 6 140 2015-06-01T12:41:03-04:00 4 8 4 16 14 105
      Frequency Hour: collectionTime DB101 DB202 ASM MYDB101 MYDB202 _OTHER_DB_ 2015-06-01T12:00:00-04:00 2015-06-01T13:00:00-04:00

        As Laurent_R said, your requirements have another dimension (or two), so my script won't work for this except for some of the basic ideas. You're probably going to want three hashes, one to collect per-hour values (%h) and one to collect per-minute value (%m), and one to collect database names (%db). Then you'll need to:

        for each line parse out the date-hour, date-hour-minute, database name, and speed add speed to $h{date-hour}{database name} add speed to $m{date-hour-minute}{database name} $db{database name} = 1 # put database name in hash loop through sorted keys of %db print them as headers, formatted to fit what's coming below loop through keys of %h (sorted if you want) print the key (the date-hour) loop through sorted keys of %db print $h{$key}{database name} print a newline now do the same with the per-minute hash %m

        Try coding that, and let us know if you need help.

        Aaron B.
        Available for small or large Perl jobs and *nix system administration; see my home node.

        Hi,

        your data structure is obviously too simplified for the level of details that you want to display, your %h hash needs an extra level of information (the time). Once you have added that, it should be only a matter of displaying correctly.

Re: Computing results through Arrays
by GotToBTru (Prior) on Jun 05, 2015 at 12:51 UTC

    The need to present the data using different internal organizations screams DATABASE to me. See this tutorial here at PM; Google will find other examples.

    Dum Spiro Spero

      I agree that a database could be a good way to store this data, especially if it's going to be mined in various ways. However, if I'm given this file as-is and told to produce the reports specified by the OP, I'm not going to go to the trouble to put it in a database so that I can run SQL statements on it. I can write a script to get it directly in far less time, so a database will only become an advantage if I need to pull reports from it in several different ways. (And I already know SQL; the newbie would have to learn a second language first.)

      Aaron B.
      Available for small or large Perl jobs and *nix system administration; see my home node.

        aaron_baugher:
        I agree that a database could be a good way to store this data, especially if it's going to be mined in various ways. However, if I'm given this file as-is and told to produce the reports specified by the OP, I'm not going to go to the trouble to put it in a database so that I can run SQL statements on it. I can write a script to get it directly in far less time, so a database will only become an advantage if I need to pull reports from it in several different ways.
        I agree with every single word of what you say, Aaron.

        To me, using a database is overkill for such a simple problem. I would certainly not do that unless there are some other very good reasons to do it.

        Yeah. That's why my reply is a good third reply. As long as the scope of the problem stays this small and this specific, no need to bring out the big guns. In case it won't ... the sooner he learns databases and SQL, the better.

        Dum Spiro Spero

      Actually I am working DBA for more than 12 Years, I can meet this requirement in no time if this was in SQL. But unfortunately these data are pulled from Oracle Exadata Storage servers and I don't want to store it in Production databases which are purely meant for storing Customer Data. This is why I am striving very hard to learn perl from experts like you guys...one good thing I notice in perl world is, perl experts are very helpful in nature. Thanks a lot for helping beginners like me !!!

        these data are pulled from Oracle Exadata Storage servers and I don't want to store it in Production databases which are purely meant for storing Customer Data.

        Use SQLite or a local PostgreSQL. Both are free as in speech and free as in beer, and both are supported by DBI: DBD::SQLite, DBD::Pg. Note that DBD::SQLite already includes the database engine (SQLite), so after cpan DBD::SQLite, you can start using SQLite without installing anything else.

        Alexander

        --
        Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1129170]
Approved by marto
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others having a coffee break in the Monastery: (4)
As of 2024-04-25 17:50 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found