Beefy Boxes and Bandwidth Generously Provided by pair Networks
Your skill will accomplish
what the force of many cannot
 
PerlMonks  

How best to tell when my hash is "full" (all values defined)?

by OfficeLinebacker (Chaplain)
on Dec 17, 2006 at 03:31 UTC ( [id://590264]=perlquestion: print w/replies, xml ) Need Help??

OfficeLinebacker has asked for the wisdom of the Perl Monks concerning the following question:

Greetings, esteemed monks!

I create a hash with spaces for where I need information (just declare the keys with a slice). I then loop through some lines of data, parsing it out for the info I need. Generally, the data I need are near the beginning of the data, and I would like to escape from the loop as soon as I have the data I need, rather than after I've parsed all the lines. Here's what I have so far:

#get running processes, users, 1 min load average, total mem, free mem + (ultimately display free mem %) my @raw=`top -b -n 1`; my %stats; @stats{'users','load','tmem','fmem','runproc'} = (); my $its=0; foreach my $line (@raw){ if ($line =~ /up\s.+\s(\d+)\suser.+\s+load\saverage:\s+(\d+\.\d{2}), +/){ $stats{users}=$1; $stats{load}=$2; } elsif ($line =~ /(?:Tasks|processes):.+\s+(\d+) running/i){ $stats{runproc}=$1; } elsif ($line =~ /^Mem:\s+(\d+)k\s+(?:total|av),.+used,\s+(\d+)k\s+fr +ee/){ $stats{tmem}=$1; $stats{fmem}=$2; } $its++; my $uds=0; foreach (values(%stats)){ defined or ++$uds; } if ($uds){ print "Still $uds undefined values"; } else{ print "I'm done!"; last; } } ## end foreach my $line (@raw) print "I looped $its times to collect the data I need from $host";
It just seems kind of clunky to me. That's partially because of the debugging output (it will eventually be silent--something like ($uds) || last; ). However, I'll probably want to do an additional check after the loop to make sure I didn't parse every single process and still come up empty on some of my stats due to (unexpected output from top|poorly formed REs).

Yes, I do know that "free" memory reported by top is not necessarily indicative of how much memory is available for new processes, due to buffering.


I like computer programming because it's like Legos for the mind.

Replies are listed 'Best First'.
Re: How best to tell when my hash is "full" (all values defined)?
by TimToady (Parson) on Dec 17, 2006 at 04:23 UTC
    How 'bout don't pre-create the keys, then just use keys() to count when the number of keys is up to the number you expect?
      D'OH! Thanks!

      I like computer programming because it's like Legos for the mind.
        For that matter, you could just do it like this (adding the suggestion from another reply below):
        my %stats; open(TOP, "top -b -n 1 |"); my $ndone = 0; while (<TOP>) { if ( /up\s.+\s(\d+)\suser.+\s+load\saverage:\s+(\d+\.\d{2}),/ ){ $stats{users}=$1; $stats{load}=$2; $ndone +=2; } elsif ( /(?:Tasks|processes):.+\s+(\d+) running/i ){ $stats{runproc}=$1; $ndone++; } elsif ( /^Mem:\s+(\d+)k\s+(?:total|av),.+used,\s+(\d+)k\s+free/ ){ $stats{tmem}=$1; $stats{fmem}=$2; $ndone += 2; } last if ( $ndone == 5 ); } if ( $ndone < 5 ) { warn "I only got $ndone factoids from top. Bummer.\n"; } else { print "I got everything in just $. lines of input.\n"; }
Re: How best to tell when my hash is "full" (all values defined)?
by McDarren (Abbot) on Dec 17, 2006 at 04:42 UTC
    It seems to me that you only need data from the first three lines of the output from top. So why not just read those three lines and then exit the loop?

    Also, I'd be more inclined to use a piped open and a while loop, rather than reading the entire output into an array. Especially as you are only using the first few lines.

    Here is your script re-written slightly - the ouput appears to be the same as your original version. (Note - I didn't touch your pattern matches)

    Update: whoops! In my enthusiasm to re-write your code, I forgot to address your main question. As TimToady pointed out, all you need to do is not predefine the hash keys, and then when you are done compare the number keys you have against the number you were expecting. Code updated to reflect that.

    #!/usr/bin/perl -wl use strict; use Data::Dumper::Simple; my %stats; my $expected_keys = 5; my $top = "/usr/bin/top -b -n 1"; open(TOP, "$top|") or die "Cannot read top:$!"; while (my $line = <TOP>) { if ($line =~ /up\s.+\s(\d+)\suser.+\s+load\saverage:\s+(\d+\.\d{2}), +/){ $stats{users}=$1; $stats{load}=$2; } elsif ($line =~ /(?:Tasks|processes):.+\s+(\d+) running/i){ $stats{runproc}=$1; } elsif ($line =~ /^Mem:\s+(\d+)k\s+(?:total|av),.+used,\s+(\d+)k\s+fr +ee/){ $stats{tmem}=$1; $stats{fmem}=$2; last; } } print Dumper(%stats); my $num_keys = scalar keys %stats; if ($num_keys != $expected_keys) { print "Whoops! Expected $expected_keys keys, but I got $num_keys!" +; }

    Output on my machine:

    %stats = ( 'fmem' => '34048', 'tmem' => '775664', 'users' => '4', 'runproc' => '1', 'load' => '0.00' );

    Hope this helps,
    Darren :)

Re: How best to tell when my hash is "full" (all values defined)?
by BrowserUk (Patriarch) on Dec 17, 2006 at 06:47 UTC

    Perhaps overkill for this application, but useful where the gathering of a predetermined set of fields is spread over a more complex gathering process.

    1. Predefine the hask keys, setting the values to undef.
    2. Use Internals::SetReadOnly to ensure that no other spurious keys can be autovivified accidentally.
    3. Use (Updated)keys( %hash ) == grep defined, values( %hash ) to ensure that you got them all.

      Or more simply, 0 == grep !defined, values %hash;

    use Internals qw[ SetReadOnly ]; my %hash; undef @hash{ qw[ the quick brown fox ] }; SetReadOnly( \%hash ); ... while( <$fh> ) { if( ... some capturing regex ... ) { $hash{ $1 } = $2; } elsif( ... regex ... ) { $hash{ $1 } = $2; } else { $hash{ fox } == FOX_DEFAULT; } last if keys %hash == grep defined, values( %hash ); ## Updated, s +ee below. }

    In the event that one of your regex accidentally matches the wrong key/value pairing, setting (say) $1 = 'bill', then you'll get a fatal error:

    Attempt to access disallowed key 'bill' in a restricted hash at yoursc +ript.pl line nn, <$fh> line mm

    Telling you not only which line of code the error occured, but also which line of the data file (and the value) that was involved.


    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
      I'm not sure how your last-condition keys( %hash ) == values( %hash ) works ... isn't this always true for all hashes (the number of keys is equal to the number of values)?

      -- Hofmator

      Code written by Hofmator and posted on PerlMonks is public domain. It is provided as is with no warranties, express or implied, of any kind. Posted code may not have been tested. Use of posted code is at your own risk.

        Yes, of course it should++. I was concentrating on the idea of avoiding accidental autovivification.


        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.

        Yes. It should be

        keys( %hash ) == grep defined, values( %hash )

        (which can be simplified).

Re: How best to tell when my hash is "full" (all values defined)?
by bart (Canon) on Dec 17, 2006 at 10:01 UTC
    To answer your question (which is what you asked, not necessarily what you want :) ): use grep.
    my @undefined = grep !defined $stats{$_}, keys %stats; if(@undefined { local $" = ", "; my $undefined = @undefined; warn "$undefined values were not defined, for keys (@undefined)\n" +; }

    If you just want a simple count, not an elaborate report, you can use the simpler

    my $undefined = grep !defined, values %stats; warn "$undefined values were not defined\n" if $undefined;
      bart, thanks for the reply. grep is another one of those functions that I don't know well. Continuing on our theoretical tangent, if you will, could one golf your approach down to
      grep !defined, values %stats || last;

      as a quick and simple escape clause?

      For some reason I like the idea of predeclaring what data I want (by mnemonic), and "filling" an empty data structure, as opposed to creating the key-value pairs on the fly. It's just mental I guess, though I suppose it may increase readability somewhat. Also, is there a memory or performance penalty if you don't predeclare the numer of elements in a hash? Returning the scalar of the hash gives me 4/8 with the predeclared version; I suppose I will try it the other way and report back.

      Thanks again to everyone,
      T.


      I like computer programming because it's like Legos for the mind.
        Hi,
        grep !defined, values %stats || last; my $swvn=grep !defined, values %stats; print $swvn; $swvn || last;
        produces
        3
        2
        2
        0
        
        so that means it doesn't work. The "fix" is to scalar() what grep() returns:
        scalar(grep !defined, values %stats) || last; my $swvn=grep !defined, values %stats; print $swvn; $swvn || last;
        produces
        3
        2
        2
        
        I'll check the memory consuption thing shortly.

        I like computer programming because it's like Legos for the mind.
        Hi again :) The following code (I plan to implement the piped open shortly, but just used what I have to test the memory usage of the hashes in the two approaches)
        my @raw=`top -b -n 1`; my (%stats,%stats2); my $expected_keys = 5; @stats{'users','load','tmem','fmem','runproc'} = (); my $its=0; foreach my $line (@raw){ if ($line =~ /up\s.+\s(\d+)\suser.+\s+load\saverage:\s+(\d+\.\d{2}), +/){ $stats{users}=$1; $stats{load}=$2; $stats2{users}=$1; $stats2{load}=$2; } elsif ($line =~ /(?:Tasks|processes):.+\s+(\d+) running/i){ $stats{runproc}=$1; $stats2{runproc}=$1; } elsif ($line =~ /^Mem:\s+(\d+)k\s+(?:total|av),.+used,\s+(\d+)k\s+fr +ee/){ $stats{tmem}=$1; $stats{fmem}=$2; $stats2{tmem}=$1; $stats2{fmem}=$2; } $its++; print scalar(%stats2); print scalar(%stats); (scalar(keys(%stats2)) == $expected_keys) && last; } ## end foreach my $line (@raw)
        outputs
        
        2/8
        4/8
        2/8
        4/8
        2/8
        4/8
        4/8
        4/8
        
        So the memory usage is no different (correct me if I am wrong--my understanding is that the number we are worried about (if we're worried about memory consumption) is the divisor).

        I'll go ahead and make the call to `top` a piped open and have done with it. Thanks for your help (everyone who posted).

        T.

        I like computer programming because it's like Legos for the mind.
Re: How best to tell when my hash is "full" (all values defined)?
by OfficeLinebacker (Chaplain) on Dec 17, 2006 at 09:29 UTC

    ++McDarren, Tim, graff, Browser

    McDarren: Yes I was thinking of truncating the output from top but it's tricky depending on whether hyperthreading is on and SMP view is on, etc. I guess I could just do a head--would open(TOP, "top -b -n 1 | head -n 10 |")work?. For that matter, would
    foreach my $l (`top -b -n 1`){ #or, for that matter, foreach(`top -b -n 1 | head -n 10`)? #the code goes here }
    work?
    Browser, I don't quite get how doing assignments like $stats{$1}=$2 would eventually make keys( %hash ) == values( %hash ) true. The values I am picking off from the data are mostly numbers, while the keys are strings I made up that describe what the numbers are.
    I do kind of like the idea of making the list of keys immutable. But, since the values for the keys are fixed ahead of time (in my mind, anyway) and do not exist in the data I am parsing, I don't quite get the practical use of that in this particular instance.

    But, it is 4 am here and I am punchy so I could be missing the obvious (like don't predeclare the hash keys).

    Finally, since the bottleneck here is waiting for top to finish (and I choose top because it's "one-stop shopping" for the stats I want), could we background the piped open, (as open(TOP, "top -b -n 1 & |")), theoretically allowing us to start reading the first few lines before the whole list of tasks is even done being written?

    Thanks,
    Terrence

    I like computer programming because it's like Legos for the mind.
Re: How best to tell when my hash is "full" (all values defined)?
by OfficeLinebacker (Chaplain) on Dec 18, 2006 at 17:48 UTC
    OK, since I am feeling kind of curious today, I decided to try benchmarking the various approaches. On my first run (with 100 iterations), I got:
    Method 1 is foreach my $line (`top -b -n 1`) and creating the hash keys as we go.
    Method 2 is foreach my $line (`top -b -n 1`) and precreating the hash keys.
    Method 3 is foreach my $line (@raw) where @raw=`top -b -n 1`and precreating the hash keys.
    Method 4 is while (my $line = <TOP>) with open(TOP, "top -b -n 1|") and precreating the hash keys.
    Method 1 mean:  0.560507924556732 stddev:  0.00320759260168446!
    Method 2 mean:  0.560737266540527 stddev:  0.00290542841782241!
    Method 3 mean:  0.561151208877563 stddev:  0.00359781593804784!
    Method 4 mean:  0.558518676757812 stddev:  0.00518282488688389!
    T-stat for mean 1 and mean 2 is 0.529922998231174
    T-stat for mean 1 and mean 3 is 1.33459955560607
    T-stat for mean 1 and mean 4 is 3.26368009466516
    T-stat for mean 2 and mean 3 is 0.895111546983644
    T-stat for mean 2 and mean 4 is 3.73396330152602
    T-stat for mean 3 and mean 4 is 4.17253188415264
    

    Note that I did not throw away outliers, and I have no control over what other users were doing on the machine, so this is by no means rigorous, yet I think it's still useful. I think it's pretty clear that piped opens beat the other methods hands down--good call, McDarren!

    Anyway, this led me to try another combination with the same methodology and under the same conditions, the results of which follow:
    Method 1 is while (my $line = <TOP>) with open(TOP, "top -b -n 1|") and creating the hash keys as we go.
    Method 2 is while (my $line = <TOP>) with open(TOP, "top -b -n 1|") and precreating the hash keys.
    Method 1 mean:  0.557748141288757 stddev:  0.00368823257386979!
    Method 2 mean:  0.558288831710816 stddev:  0.00673477747628604!
    T-stat for mean 1 and mean 2 is 0.704155995405561
    
    A t-stat of 0.7 for r=100 gives you a confidence of between 75 and 90%; that's not bad. It might be just as much an artifact of the escape clause as anything else, but it's interesting.

    Code available upon request.

    Thanks again,
    T.

    I like computer programming because it's like Legos for the mind.
      OK, one last test and I promise I'm done. I tried the same as above but with 1000 iterations:
      Method 5 is while (my $line = <TOP>) with open(TOP, "top -b -n 1|") and creating the hash keys as we go.
      Method 6 is while (my $line = <TOP>) with open(TOP, "top -b -n 1|") and precreating the hash keys.
      Method 5 mean:  0.557365287780762 stddev:  0.00308031218339066!
      Method 6 mean:  0.557536187648773 stddev:  0.00521559192260659!
      T-stat for mean 5 and mean 6 is 0.892202830875346
      
      So the mean is slightly significantly lower for creating on-the-fly, and the variance is significantly lower for that approach. Interesting.

      I like computer programming because it's like Legos for the mind.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://590264]
Approved by McDarren
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others studying the Monastery: (4)
As of 2024-04-25 07:06 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found