Beefy Boxes and Bandwidth Generously Provided by pair Networks
Come for the quick hacks, stay for the epiphanies.
 
PerlMonks  

Re: Reducing memory usage while matching log entries

by duff (Parson)
on Feb 01, 2006 at 14:53 UTC ( [id://527065]=note: print w/replies, xml ) Need Help??


in reply to Reducing memory usage while matching log entries

I'm wondering what the better way is. :)

Rather than storing the whole file in memory, use a 2 pass solution. In the first pass, gather the locking information. In the second pass, remove the appropriate lines. Each pass will consist of just a while loop that only holds one line at a time in memory.

Update: Another option would be to use Tie::File. It lets you treat the file as if it were an array but manages all of the gory details for you. I just looked at the man page and it has an option to limit the amount of memory it consumes.

  • Comment on Re: Reducing memory usage while matching log entries

Replies are listed 'Best First'.
Re^2: Reducing memory usage while matching log entries
by matt.tovey (Beadle) on Feb 01, 2006 at 16:07 UTC
    I like this idea but I'm having trouble get a 2 pass solution to work nicely. In the first pass I identify which line numbers need to be removed. But then I'll need to mung that list into some convenient form to use while reading the file again in the second pass...

    Tie::File looks quite handy, but since I don't want to alter the original log file, I'd have to copy it first and then reduce it, which could be a problem if disk-space is tight.

    So, so far storing the whole file in a hash so that I can properly delete the lines no longer required seems like a winner.

    Salva: The 'sort' idea has the problem that a given lock ID can be locked and unlocked multiple times in the file, so the sorted valued won't always be 'locking' follwed by 'unlocked'. Plus the contents of the logfile need to remain in the correct order for analysis of the remaining contents... but thanks!

      I like this idea but I'm having trouble get a 2 pass solution to work nicely. In the first pass I identify which line numbers need to be removed. But then I'll need to mung that list into some convenient form to use while reading the file again in the second pass...

      I don't understand your difficulty. Pass #1 records line numbers, pass #2 writes all the lines that haven't been recorded. Here's some code based on your original but with a few tweaks:

      #!/usr/bin/perl use strict; use warnings; sub Log; die "Usage: $0 <filename>\n" unless @ARGV == 1; my $logfile = shift; # Pass #1 : gather line numbers to be deleted. my %locks; # Hash of currently open locks. my @unlock_lines; # lines to rid ourselves of open(LOGFILE,$logfile) or die "Can't read $logfile - $!\n"; while (<LOGFILE>) { Log 2, "Analysing line $."; next unless /Mutex\((.*?)\)::(\w+)/; my ($address,$action) = ($1,$2); if ($action eq 'locking') { Log 2, "Address $address locked at line $."; if (defined $locks{$address}) { Log 0, "ERROR: Address $address locked at line $., but already l +ocked at line $locks{$address}."; } $locks{$address} = $.; } if ($action eq 'unlocked') { Log 2, "Address $address unlocked at line $."; unless (defined $locks{$address}) { Log 0, "ERROR: Address $address not locked, but unlocked at lin +e $.."; } else { push @unlock_lines, $., delete $locks{$address}; } } } close LOGFILE; # Sort the lines numbers that we've accumulated because we put them in # unordered. This allows us to make just one more pass through the fil +e # to remove the lines. @unlock_lines = sort { $a <=> $b } @unlock_lines; # Pass #2: output all but the lines we're not interested in. my $rmline = shift @unlock_lines; open(LOGFILE,$logfile) or die "Can't read $logfile - $!\n"; while (<LOGFILE>) { if (defined $rmline && $. == $rmline) { $rmline = shift @unlock_lines; next; } print; } close LOGFILE;
            I like this idea but I'm having trouble get a 2 pass solution to work nicely...

          I don't understand your difficulty...

        Sorry, I had to leave work about the time of my last message yesterday (well, actually 10 minutes _before_ my last message!), but didn't want to disappear without writing back. And I'm not good at coding under stress (and not fantastic the rest of the time either!).

        Anyway, thanks for taking the time to write out the code there. Strangely enough, I tried it this morning, and the memory usage of this is actually higher than the original, at least as Linux measures it! I'm processing a 65MB test file - after the first pass the script is consuming 100MB of memory, and after the sort then 200MB!
        With this test file, @unlock_lines ends up with 800000 entries, but still I was surprised. My original script used (RSS & VSZ) 160MB, and using a hash to store the file in memory (so as to properly free the deleted lines) brings it down to 110MB...

        Thanks also for '$.' - I didn't know about that one!

      The 'sort' idea has the problem that a given lock ID can be locked and unlocked multiple times in the file, so the sorted valued won't always be 'locking' follwed by 'unlocked'.

      That shouldn't be a problem as long as you use an stable sort implementation or alternatively the line number or a timestamp as the secondary sorting key.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://527065]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others examining the Monastery: (1)
As of 2024-04-19 00:37 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found