Scan ARP cache dump - memory hog

seanbo has asked for the wisdom of the Perl Monks concerning the following question:

WARNING!! I did not use strict

With that said... the code listed below is used to read a file and extract IP addresses so we can determine what IP addresses hadn't been used in a while and we can reclaim to give out for use.

As you will se by the script, it has about 250K records. When I run this script, I depleat all memory on our server (which happens to be our primary DNS server). Aside from running it on a non-production box, do any of you have suggestions for helping memory and speed performance?

I am having a HUGE problem with the line:

my @matches = grep {defined $_} map { /^($subnet\.\d+)/ and $1; } @lin
+es;
[download]

Here is the whole program:

#!/usr/bin/perl -w

  use Getopt::Std;
  use Net::Netmask;

  my $subnet;
  my $ARP = '/tmp/arp.bak';

  getopt('sm');

  if ($opt_s ne '') {
    $subnet = $opt_s;
  } else {
    die "Please supply a subnet to scan.";
  }

  my $block = new Net::Netmask($subnet);
  my @range = $block->enumerate();

  #We need to get our temp file
  system `tail -250000 /tmp/arp > $ARP`;

  my @lines = <DATA>;
  chomp(@lines);

  #Get the IPs that are preset with first 3 octets matching
  my @matches = grep {defined $_} map { /^($subnet\.\d+)/ and $1; } @l
+ines;

  #Get the difference of the block and the IP's found
  my @intersection = my @difference = ();
  my %count = ();
  foreach $element (@matches, @range) { $count{$element}++ }
  foreach $element (keys %count) {
      push @{ $count{$element} > 1 ? \@intersection : \@difference }, 
+$element;
  }

  #make unique and then sort
  undef %pre;
  @pre{@difference} = ();
  @clean = keys %pre;
  my @sorted = sort {
    pack('C4' => $a =~ /(\d+)\.(\d+)\.(\d+)\.(\d+)/)
    cmp
    pack('C4' => $b =~ /(\d+)\.(\d+)\.(\d+)\.(\d+)/)
  } @clean;

  print join("\n",@sorted);

__DATA__
192.168.87.134       0x00804F429376        Vl2n22
192.168.87.135       0x00804F429466        Vl2n22
192.168.87.136       0x00804F4288E1        Vl2n22
192.168.87.138       0x00804F425776        Vl2n22
192.168.87.139       0x00804F461812        Vl2n22
192.168.87.144       0x0060B0280D05        Vl2n22
192.168.87.146       0x0001E62B36E7        Vl2n22
192.168.87.147       0x0060B030889D        Vl2n22
192.168.87.148       0x0060B0521032        Vl2n22
192.168.87.149       0x00804F45F98E        Vl2n22
192.168.87.150       0x00804F559E8D        Vl2n22
192.168.87.151       0x00804F4627D8        Vl2n22
192.168.87.153       0x00804F58469D        Vl2n22
192.168.87.156       0x0060B032E627        Vl2n22
192.168.87.157       0x00804F4627B2        Vl2n22
192.168.87.159       0x00804F462788        Vl2n22
192.168.89.163       0x00008086DEE1        Vl2n22
192.168.89.164       0x00804F41D3F7        Vl2n22
192.168.89.165       0x00804F4294F2        Vl2n22
192.168.89.167       0x00804F1E81F9        Vl2n22
192.168.87.168       0x00902784B532        Vl2n22
192.168.87.169       0x0060B012F54E        Vl2n22
192.168.89.171       0x0040883FB2D1        Vl2n22
192.168.87.172       0x0001E63D35BE        Vl2n22
192.168.89.173       0x0060B0F270BB        Vl2n22
192.168.87.250       0x00405801279B        Vl2n22
192.168.87.251       0x0040580008D6        Vl2n22
[download]

perl -e 'print reverse qw/o b n a e s/;'

Comment on Scan ARP cache dump - memory hog Select or Download Code

Replies are listed 'Best First'.
Re: Scan ARP cache dump - memory hog by ferrency (Deacon) on Jul 03, 2002 at 19:04 UTC
Currently you're slurping the entire data set into memory at once; not only that, you're copying possibly huge chunks of it several more times. If you can build your code around a `while()` loop, and process each line at a time instead of slurping the entire file, you'd be much better off, memory-wise. `# instead of this: my @lines = <DATA>; # do something like this: while my $line (<DATA>) { ... }` [download] Even if you only build `@matches` in that loop and keep the rest of the code the same, you may be much better off (assuming you have few matches compared to the size of the dataset). Deleting arrays after you're done with them (use `my` and arrange the code so they go out of lexical scope) will also help with memory reuse. If you can more clearly explain what this code is supposed to do, we might be able to find a much more straightforward solution. As it is, the code seems to be doing the same thing over again several times in different ways before printing its final results. Alan	[reply] [d/l] [select]
Re: Re: Scan ARP cache dump - memory hog by seanbo (Chaplain) on Jul 04, 2002 at 02:52 UTC
Just to further explain what I am tryint to achieve. We currently manage acouple hundred subnets at my job. Each Monday morning, we get ARP cache dumps from all of our routers sent to us. People send us requests for IP addresses and DNS names. There are network admins that are notorious for not returning IP addresses. What I am trying to do, is take the last few months worth of ARP information (that is the tail -250000... command. It's jsut an approximation). We use that to determine which IPs have had no activity for a while and we remove the allocation from our records and notify the admin that we had it assigned to that we have reclaimed the address. Thanks for the input! perl -e 'print reverse qw/o b n a e s/;'	[reply]
Re: Scan ARP cache dump - memory hog by flocto (Pilgrim) on Jul 03, 2002 at 19:07 UTC
If you're concerned about memory usage, you shouldn't read the entire file at once. Read it line by line and count the IPs in a hash, so you don't get duplicated entries. And, of course, use strict, but I guess you knew that already :) Anyhow, here's a snipped that came to my mind: `#!/usr/bin/perl -w use strict; my $subnet = '192.168.87'; my %data = (); # precompile regex for performance.. my $regex = qr#^($subnet\.\d+)#; # read file line by line while (my $line = <DATA>) { chomp($line); if ($line =~ $regex) { $data{$1}++; } elsif ($debug) { print STDERR "Didn't match: $line\n"; } }` [download] If you really do want to stick to your own code, your line is better written as (it's not very nice either..): `my @matches = grep { m/$regex/ } @lines; ($_) = m/$regex/ foreach @matches;` [download] Regards, -octo-	[reply] [d/l] [select]
Re: Re: Scan ARP cache dump - memory hog by seanbo (Chaplain) on Jul 04, 2002 at 02:57 UTC
I'll give this a try. I was trying to be elegant and do things faster than brute forcing my way line by line, but I guess you see where that got me... :-( Yea, I know I really should have been using strict and i felt like an idiot posting the code without it (thus the warning up top). Thanks for your input! when I clean up the code (and am using strict like I should be), I will repost the code. perl -e 'print reverse qw/o b n a e s/;'	[reply]
Re: Scan ARP cache dump - memory hog by seanbo (Chaplain) on Jul 04, 2002 at 14:15 UTC
OK, here is the updated code. I modified it to read from a file instead of <DATA>. And guess what?!?! It runs under strict!! (Oh, it works too) ++ to ferrency and flocto for their help!! #!/usr/bin/perl -w use strict; use vars qw/ $opt_s /; use Getopt::Std; use Net::Netmask; my $subnet; my %data = (); my $ARP = '/tmp/arp.bak'; getopt('s'); if (defined($opt_s) && $opt_s ne '') { $subnet = $opt_s; } else { help(); } my $block = new Net::Netmask($subnet); if (defined($block->{'ERROR'})) { die "Invalid subnet/mask combinati +on."} my @range = $block->enumerate(); #We need to get our temp file system `tail -250000 /tmp/arp > $ARP`; #Open the file and read it line by line open(FH, $ARP) \|\| die ("Couldn't open the arp file $ARP"); while (<FH>) { if (/^((?:\d{1,3}\.){3}\d{1,3})/) { if ($block->match($1)) { $data{$1}++; } } } close FH; #I need to specifically remove the network, gateway and broadcast #addresses since we don't care about those. #First remove from enumeration array shift @range; #Network Address shift @range; #Gateway Address pop @range; #Broadcast Address #Now remove from the IP's found in the ARP cache delete $data{$block->base()}; delete $data{$block->nth(1)}; delete $data{$block->broadcast()}; my @matches = keys %data; #Compare the array of matched IPs to the enumerated Netblock my @intersection = my @difference = (); undef %data; foreach my $element (@matches, @range) { $data{$element}++ } foreach my $element (keys %data) { push @{ $data{$element} > 1 ? \@intersection : \@difference }, $ +element; } #Now I'd like to sort the IPs (a little Schwartzian Transform action + here...) my @sorted = map { join '.', unpack 'N', $_ } sort map { pack 'N', split /\./ } @difference; print "Addresses that are candidates for reclaim in:\n"; print $block->desc(), "\n\n"; print join("\n",@sorted), "\n"; sub help { print <<'HELP'; You must supply a valid subnet. Acceptable formats are as follows: 192.168.1.0/24 <--- The preferred form. 192.168.1.0:255.255.255.0 192.168.1.0-255.255.255.0 syntax: arpscan.pl -s 192.168.1.0/24 HELP exit(1); } [download] Update: Modified the IP sort to use a Schwartzian Transform so I can have them truly sorted like IP's should be. Update: Added a little help, support for VLSM's, and stripped out un-needed addresses (network, gateway, and broadcast). note - the gateway is specific to our organization, yours may use a different address, we use network + 1. Thanks to tye, belg4mit, and arturo for your help with my regex issue. /msg me if I left you out. Update: Added code to check for invalid IP address/mask combination since some joker here already tried to enter something like 192.168.1.256/24. Update:** Fixed the regex (read as: removed a space that I would have never seen in a million years!) ++tye perl -e 'print reverse qw/o b n a e s/;'	[reply] [d/l]


Just another Perl shrine
	PerlMonks