Beefy Boxes and Bandwidth Generously Provided by pair Networks
Keep It Simple, Stupid
 
PerlMonks  

Group by data in shell pipes

by dpavlin (Friar)
on May 22, 2008 at 13:18 UTC ( [id://687952]=CUFP: print w/replies, xml ) Need Help??

My mind is just too accustomed to RDBMS engines to accept that I can't have GROUP BY in my shell pipes. So I wrote one groupby.pl.

Aside from fact that it somewhat looks like perl golfing (which I'm somewhat proud of), let's see how does it look:

dpavlin@llin:~/private/perl$ ps axv | \
 ./groupby.pl 'sum:($6+$7+$8),10,count:10,min:($6+$7+$8),max:($6+$7+$8)' | \
 sort -k1 -nr | head -10 | align
440947 /usr/lib/iceweasel/firefox-bin        1 440947 440947
390913 /usr/sbin/apache2                    11  22207  39875
180943 /usr/bin/X                            1 180943 180943
135279 /usr/bin/pinot-dbus-daemon            1 135279 135279
122254 mocp                                  2  25131  97123
 84887 pinot                                 1  84887  84887
 78279 postgres:                             5  10723  21971
 70030 /usr/bin/perl                         6   6959  15615
 50213 /bin/bash                             7   6351   7343
 49266 /usr/lib/postgresql/8.2/bin/postgres  2  24631  24635
This will display total usage for process, it's name, number of such processes and range of memory usage. We can then use old friend sum.pl to produce console graph, but I already wrote about it.

So, let's move to another example, this time for OpenVZ. Let's see how much memory is each virtual machine using (and get number of processes for free):

$ vzps -E axv --no-headers | ./groupby.pl 'sum:($7+$8+$9*1024),1,count:1'
2209504127 0 265
611768242 212024 38
162484775 212037 19
170797534 212052 38
104853258 212226 26
712007227 212253 21
But wouldn't it be nice to display hostnames instead of VEID numbers? We can, using --join and --on options (which are really backticks on steroids):
$ vzps -E axv --no-headers | \
 ./groupby.pl 'sum:($7+$8+$9*1024),1,count:1' \
 --join 'sudo vzlist -H -o veid,hostname' --on 2                                 
2146263206 0 259
675835528 saturn.ffzg.hr 40
162484775 arh.rot13.org 19
170797534 koha-dev.rot13.org 38
104853258 koha.ffzg.hr 26
712011323 zemlja.ffzg.hr 21
Which brings us to final result:
$ vzps -E axv --no-headers | \
 ./groupby.pl 'sum:($7+$8+$9*1024),1,count:1' \
  --join 'sudo vzlist -H -o veid,hostname' --on 2 | \
  sort -rn | align | ./sum.pl -h
0                  260   2105M OOOOOOOOOOOOOOOOOOO                  2105M
zemlja.ffzg.hr      21    679M OOOOOO-------------------            2784M
saturn.ffzg.hr      35    512M OOOO--------------------------       3296M
koha-dev.rot13.org  38    162M O------------------------------      3459M
arh.rot13.org       19    154M O--------------------------------    3614M
koha.ffzg.hr        26     99M ----------------------------------   3714M
So, here you have it: SQL like query language for your shell pipes.

2share!2flame...

Replies are listed 'Best First'.
Re: Group by data in shell pipes
by pmonk4ever (Friar) on May 22, 2008 at 17:25 UTC
    dpavlin,

    Wow, that is totally cool!!! 3 thumbs way UP!!!

    :>)

    ki6jux

    "No trees were harmed in the creation of this node. However, a rather large number of electrons were somewhat inconvenienced."

Re: Group by data in shell pipes
by EvanCarroll (Chaplain) on May 22, 2008 at 20:49 UTC
    Without looking at all of your code (largely because you're using a vzps, a utility I don't have). I would shameless interject DataExtract::FixedWidth, it might help you parse some of this data. And provide you with a different more friendly less golfish API into it, (if you have a header row). Your idea of SQL in a shell interests me, I might try to implement something myself for fun.


    Evan Carroll
    I hack for the ladies.
    www.EvanCarroll.com

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: CUFP [id://687952]
Approved by Corion
Front-paged by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others musing on the Monastery: (6)
As of 2024-03-29 10:04 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found