Beefy Boxes and Bandwidth Generously Provided by pair Networks
There's more than one way to do things
 
PerlMonks  

Converting a growing hash into an array of arrays

by madbombX (Hermit)
on Jul 14, 2006 at 16:35 UTC ( [id://561258]=perlquestion: print w/replies, xml ) Need Help??

madbombX has asked for the wisdom of the Perl Monks concerning the following question:

Monks,

I am attempting to write a script that takes all the data in a log file and parses out for a value. Ie, it takes the amavid-new logfile and pulls out the SPAM hit points per message and continually tails the file (using File::Tail) and a forked (daemonized process) and adds values to the hash as new messages come in. I am also trying to graph these values in a bar graph in order to see trends. I have settled on using GD::Graph::bars as opposed to RRDs.

My question is that incrementing a value in a 'Key => Value' pair is easy, but not in an array that is required to look as such (at least not to me):

@data = ( ["1.6","2.2","3.4","3.6","5.4","6.2","7.1", "8.1", "9.0"], [ 1, 2, 5, 6, 3, 15, 4, 3, 4], [ sort { $a <=> $b } (1, 2, 5, 6, 3, 15, 4, 3, 4) ] );
Is there a better way to do this without regenerating the array every time a message is added?

Or, would it potentially be better to do this all using RRDs (even though the aspect of time that RRD takes full advantage of is irrelevant). I just want to keep the # of messages per point total (I know the second portion of the question is barely Perl related, but I many out there are more experienced than I.

Thanks. Eric

UPDATE: Here is the shortened version of the completed (working) code.

$log = File::Tail->new( name => $MAILLOG, tail => -1); while (defined(my $line=$log->read)) { IncrData(Get_Hits($line)); if (($msgs{Total} % 200) == 1) { Create_Graph(); } } sub IncrData ($) { my $values = shift; if (exists $hits{$values} ) { ${$hits{$values}}++; } else { my $idx = 0; my $endIdx = scalar(@{$data[0]}); while ($idx < $endIdx && $data[0][$idx] < $values) { $idx++; } splice(@{$data[0]},$idx,0,$values); splice(@{$data[1]},$idx,0,1); $hits{$values} = \$data[1][$idx]; } }

Since the file is being tailed, I have the graph being recreated every 200 incoming messages. Just before the graph gets recreated, I run the sort code: @{$data[2]} = sort { $a <=> $b } @{$data[1]};

Again, thanks to all for all the help.

Replies are listed 'Best First'.
Re: Converting a growing hash into an array of arrays
by jmcada (Acolyte) on Jul 14, 2006 at 16:52 UTC
    splice?
    use Data::Dumper; my @data = ( ["1.6","2.2","3.4","3.6","5.4","6.2","7.1", "8.1", "9.0"], [ 1, 2, 5, 6, 3, 15, 4, 3, 4], [ sort { $a <=> $b } (1, 2, 5, 6, 3, 15, 4, 3, 4) ] ); splice(@{$data[0]}, 1, 0, "2.1"); splice(@{$data[1]}, 1, 0, 7); $data[2] = [sort { $a <=> $b } @{$data[1]}]; print Dumper(\@data);
      I think I should have been a little more specific. Once I get a new message coming in and it comes through the log, I have to increment the message count for a specific hit size by one. Therefore, if I have 3 messages that are 2.1 (the hash section would look like: $hits{"2.1"} = 3). Then when the new message comes in with a hit count of 2.1, then $hits{"2.1"} = 4. I know how to do this with a hash ($hits{"2.1"}++), but is there a way to do this with that multi-dimensional array I have listed above? I know splice will work for the one array($data[0]), but I need to adjust its corresponding value in the subsequent arrays.

      Thanks.

      Eric

        If I'm understanding this correctly, you need to increment the value in the second sub array based on the value found in the corresponding position on the first sub array. If so, this is probably a little over-thinking it, but it should work:
        my @data = ( ["1.6","2.2","3.4","3.6","5.4","6.2","7.1", "8.1", "9.0"], [ 1, 2, 5, 6, 3, 15, 4, 3, 4], [ sort { $a <=> $b } (1, 2, 5, 6, 3, 15, 4, 3, 4) ] ); print join(", ", @{$data[1]}), "\n"; map { $data[1]->[$_]++ if $data[0]->[$_] =~ /3.4/ } 0..$#{$data[0]}; print join(", ", @{$data[1]}), "\n"; --(0)> perl test.pl 1, 2, 5, 6, 3, 15, 4, 3, 4 1, 2, 6, 6, 3, 15, 4, 3, 4
        Eric,

        I just don't understand why you are change the easy way with hash to a hard way with array ? Do you have problem with performance ? Is it about the hash size ?

        I don't know how long will be your array/hash, but if it's going to be large, RDD would be a good solution (or other kind of database).

        Solli Moreira Honorio
        Sao Paulo - Brazil
Re: Converting a growing hash into an array of arrays
by rodion (Chaplain) on Jul 14, 2006 at 21:24 UTC
    If you've got a lot of data to work with, you really want to do your incrementing updates with a hash, otherwise you will be scanning through every element of the $data[0] array every time you want to increment. The number of compares goes up with the square of the number of log entries, which can get expensive. (The solutions by jmcada and holli, however, are nice and clear, and are efficient if the number of elements in the sub-arrays don't get too long. I'd go with that approach if your logs are short.)

    The sub below uses an external hash of references to the elements of $data[1], called %data_refs_hash. Using that hash you can do the increment directly. You only need to scan through the array when you're adding a new key (and you only need to do that if you're keeping the keys in order, if not, leave out the while() scan and just push the new values on the end instead of splicing them.)

    sub IncrData { my $key = shift; if (exists $data_refs_hash{$key} ) { # if there's a hash entry ${$data_refs_hash{$key}}++; # increment it } else { # otherwise, find a spot and put one in my $idx = 0; my $endIdx = scalar(@{$data[0]}); while ($idx < $endIdx && $data[0][$idx] < $key) { $idx++; } splice(@{$data[0]},$idx,0,$key); splice(@{$data[1]},$idx,0,1); $data_refs_hash{$key} = \$data[1][$idx]; } # if you need to update the sort each time, then do @{$data[2]} = sort { $a <=> $b } @{$data[1]}; # but you shouldn't update it until you're going to use it, # since it's expensive to sort for every increment } # IncrData
    And here's the demo and testing portion Updated: Reformatted code to move testing into "readmore" and revised non-code portion
Re: Converting a growing hash into an array of arrays
by kwaping (Priest) on Jul 14, 2006 at 21:22 UTC
    How about using values %hash or a hash slice? Sample code:
    #!/usr/bin/perl use strict; use warnings; my %hash = (1 => 2, 3 => 4, 5 => 6); my @array1 = keys %hash; print "@array1$/"; my @array2 = values %hash; print "@array2$/"; my @array3 = @hash{@array1}; print "@array3$/"; __END__ output: 1 3 5 2 4 6 2 4 6

    ---
    It's all fine and dandy until someone has to look at the code.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://561258]
Approved by davido
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others meditating upon the Monastery: (3)
As of 2024-04-26 01:07 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found