Does Net::Ping have concurrency issues?

isotope has asked for the wisdom of the Perl Monks concerning the following question:

Net::Ping works fine in a loop over a list of targets, but does not seem to do what I want if I run several sessions simultaneously.

Warning: long post follows

I'm trying to write a script to discover hosts on a local subnet. In the past, they were discovered with ping(1) to the subnet broadcast address like so:

my @lines = `ping $self->{broadcast_ip} -b -c2 2>&1 | grep \'bytes fro
+m\'`;
[download]

...with the IP addresses parsed out of those lines, etc.
However, some of the devices on the subnet do not respond to broadcast pings, but I would still like to discover them. They will all respond to direct pings. One approach would be to ping each possible address in the subnet. If nearly every address was occupied, this would take little time, however, this network is sparsely populated, so such an approach would spend minutes waiting for timeouts on unpopulated addresses.
I thought I'd be smart and fork off a client process for each address. In theory, I could compress the total time to one timeout period (which could be as little as 5 seconds, given a local subnet), plus a slight incremental time spent forking each child, plus a 25ms delay per child to prevent network flooding. This takes less than 20 seconds, even with an empty network, versus as long as 254 * 5 = 1270 seconds (more than 20 minutes) for doing sequential pinging. That sure makes forking attractive.
Here's what I tried to do:

#!/usr/bin/perl -w
use strict;

use Net::Ping;
use Time::HiRes qw(usleep);
use Fcntl qw(:DEFAULT :flock);
use POSIX qw(tmpnam mkfifo);

my $master = $$;

# Create the FIFO
my $fifo;
do {
  $fifo = tmpnam();
} until mkfifo($fifo, 0666);

# Generate list of hosts to ping
my @targets;
foreach my $target (84..87) {
  push(@targets, '10.100.19.'.$target);
}

# Make the pinger
my $pinger = Net::Ping->new('icmp', 5);

# Fork off processes
my %kids;
foreach my $target (@targets) {
  my $kid = fork;
  if($kid) {
    # Parent
    $kids{$kid} = $target;
  }
  else {
    # Child
    print "Child $target started\n";
    if($pinger->ping($target)) {
      # PING! Throw it in the FIFO
      print "Found $target!\n";
      sysopen(FH, $fifo, O_WRONLY | O_APPEND)
    or die "Can't open FIFO $fifo: $!\n";
      print "Locking FIFO\n";
      flock(FH, LOCK_EX)
    or die "Can't lock FIFO $fifo: $!\n";
      print "appending to FIFO\n";
      print FH $target."\n";
      close(FH);
    }
    else {
      warn "$target: $!\n";
      print "No response from $target\n";
    }
    $pinger->close();
    print "Child $target exiting\n";
    exit();
  }
  
  # Sleep 25 ms to prevent flooding
  usleep(25_000);
}

# Cleanup processes and gather results
my @pings;

print "Parent opening FIFO\n";
sysopen(FIFO, $fifo, O_RDONLY | O_NONBLOCK)
  or die "Can't open FIFO $fifo for reading: $!\n";

print "Parent looping over remaining kids\n";
while(%kids) {
  
  print "Parent looping wait\n";
  while((my $kid = wait()) > 0) {
    print "Parent reaping $kids{$kid}\n";
    delete($kids{$kid});
    
    print "Parent reading fifo\n";
    while(defined(my $line = <FIFO>)) {
      chomp($line);
      print "Parent got $line!\n";
      push(@pings, $line);
    }
  }
}

print "Parent closing fifo\n";
close(FIFO);

foreach my $ping (@pings) {
  print "PONG: $ping\n";
}


# Delete the FIFO whenever we exit
END {
  if($$ == $master) {
    unlink($fifo)
      or die "Couldn't unlink FIFO $fifo: $!\n";
    print "$$ unlinked $fifo\n";
  }
}
[download]

To minimize the test case, I've set the targets to be 10.100.19.84 through 10.100.19.87, where .85 and .86 actually exist. The tcpdump output below shows the attempts to ping, along with successful replies from .85 and .86, but the script reports no response from anything.

09:33:29.275634 arp who-has 10.100.19.84 tell 10.100.19.1
09:33:29.305634 10.100.19.1 > 10.100.19.85: icmp: echo request (DF)
09:33:29.305634 10.100.19.85 > 10.100.19.1: icmp: echo reply
09:33:29.355634 10.100.19.1 > 10.100.19.86: icmp: echo request (DF)
09:33:29.355634 10.100.19.86 > 10.100.19.1: icmp: echo reply
09:33:29.395634 arp who-has 10.100.19.87 tell 10.100.19.1
09:33:30.275634 arp who-has 10.100.19.84 tell 10.100.19.1
09:33:30.395634 arp who-has 10.100.19.87 tell 10.100.19.1
09:33:31.275634 arp who-has 10.100.19.84 tell 10.100.19.1
09:33:31.395634 arp who-has 10.100.19.87 tell 10.100.19.1
[download]

If I increase the usleep delay from 25_000 us to 10_000_000 us, I'm essentially running it like a loop, and I get the replies for .85 and .86:

script output:

Child 10.100.19.84 started
10.100.19.84: 
No response from 10.100.19.84
Child 10.100.19.84 exiting
Child 10.100.19.85 started
Found 10.100.19.85!
Child 10.100.19.86 started
Found 10.100.19.86!
Child 10.100.19.87 started
10.100.19.87: 
No response from 10.100.19.87
Child 10.100.19.87 exiting
Parent opening FIFO
Parent looping over remaining kids
Parent looping wait
Parent reaping 10.100.19.87
Parent reading fifo
Parent reaping 10.100.19.84
Parent reading fifo
Locking FIFO
appending to FIFO
Child 10.100.19.85 exiting
Parent reaping 10.100.19.85
Parent reading fifo
Parent got 10.100.19.85!
Locking FIFO
appending to FIFO
Child 10.100.19.86 exiting
Parent reaping 10.100.19.86
Parent reading fifo
Parent got 10.100.19.86!
Parent closing fifo
PONG: 10.100.19.85
PONG: 10.100.19.86
28674 unlinked /tmp/fileMVZqps

tcpdump:
10:27:38.255634 arp who-has 10.100.19.84 tell 10.100.19.1
10:27:39.255634 arp who-has 10.100.19.84 tell 10.100.19.1
10:27:40.255634 arp who-has 10.100.19.84 tell 10.100.19.1
10:27:48.255634 10.100.19.1 > 10.100.19.85: icmp: echo request (DF)
10:27:48.255634 10.100.19.85 > 10.100.19.1: icmp: echo reply
10:27:51.665634 arp who-has 10.100.19.86 tell 10.100.19.1
10:27:51.665634 arp reply 10.100.19.86 is-at 0:0:50:b:b3:7f
10:27:53.255634 arp who-has 10.100.19.85 tell 10.100.19.1
10:27:53.255634 arp reply 10.100.19.85 is-at 0:0:50:b:b3:7f
10:27:58.275634 10.100.19.1 > 10.100.19.86: icmp: echo request (DF)
10:27:58.275634 10.100.19.86 > 10.100.19.1: icmp: echo reply
10:28:02.375634 arp who-has 10.100.19.1 tell 10.100.19.85
10:28:02.375634 arp reply 10.100.19.1 is-at 0:7:e9:9:8a:dd
10:28:08.295634 arp who-has 10.100.19.87 tell 10.100.19.1
10:28:09.295634 arp who-has 10.100.19.87 tell 10.100.19.1
10:28:10.295634 arp who-has 10.100.19.87 tell 10.100.19.1
[download]

My current hypothesis is that Net::Ping isn't smart enough to leave alone ICMP replies that don't belong to the current process, and that perhaps the other children are grabbing and discarding the replies that should have made it to the children pinging .85 and .86. Any thoughts?

--isotope

Comment on Does Net::Ping have concurrency issues? Select or Download Code

Replies are listed 'Best First'.
Re: Does Net::Ping have concurrency issues? by ehdonhon (Curate) on May 30, 2003 at 20:07 UTC
Suggestions: Use Parallel::ForkManager splice off a segment of @targets for each child instead of only doing one target per child. It will be more efficient due to the overhead of forking processes.	[reply]
Re: Re: Does Net::Ping have concurrency issues? by phydeauxarff (Priest) on May 30, 2003 at 21:19 UTC
We had a similar issue where we needed to run through about 40 thousand IP addresses to make sure folks weren't using IP's not assigned to them....Using Parallel::ForkManager made this much simpler. For more detail on the method I used to setup the mySQL connection check out Re: Re: Re: Secure ways to use DBI? The code below pulls the IP's from a database and runs through them 50 at a time...it runs quite well on a Ultra-Sparc 60 #!/usr/bin/perl use Net::Ping; use DBI (); use Data_config; use Parallel::ForkManager; my $MAX_PROCESSES = 50; ## Create a database handle ## ## The actual user/pass info is in Data_config.pl ## my $DSN = "DBI:$DBDRIVER:database=$DATABASE:host=$DBHOST:port=$DBPORT" +; my $DBH = DBI->connect($DSN, $USERNAME, $PASSWORD, { RaiseError => 1, PrintError => 1 }); $\|=1; my $PING_TIMEOUT = 2; $pm = new Parallel::ForkManager($MAX_PROCESSES); foreach my $IP (map { $_->[0] } @{ $DBH->selectall_arrayref( "SELECT ip_address FROM ips" )}) { # Forks and returns the pid for the child: my $pid = $pm->start and next; my $ping = new Net::Ping ("icmp"); if ($ping->ping($IP, $PING_TIMEOUT)) { print "$IP Gotcha! \n"; } else { print "$IP \n"; } $ping->close(); $pm->finish; # Terminates the child process } $pm->wait_all_children; [download]	[reply] [d/l]
Re: Does Net::Ping have concurrency issues? by isotope (Deacon) on May 30, 2003 at 18:05 UTC
Ok, duh, moved Net::Ping->new() to within the child block so each child instantiates its own $pinger. Now it works. I guess Net::Ping tracks the responses by PID. Update: Ok, brain fart... You're right, Thelonius, it's using one socket per instantiation instead of creating a new one for each ping. I'll blame my allergy attack. --isotope	[reply]
Re: Re: Does Net::Ping have concurrency issues? by Thelonius (Priest) on May 30, 2003 at 18:17 UTC
I guess Net::Ping tracks the responses by PID. No, not exactly. Net::Ping opens a socket, which is like a file descriptor. The socket is bound to a given port in the chosen protocol. When you fork, the socket (like any file descriptor) is shared by the children. When a ping response comes back, the operating system network driver looks at the port number in the packet to determine where to put the data. All your processes are trying to read the same port, but only one is going to get it.	[reply]
Re: Does Net::Ping have concurrency issues? by hardburn (Abbot) on May 30, 2003 at 18:05 UTC
Are you planning on pinging an entire /24? That's an awful lot of forking. Try perl 5.8.0 threads instead, which (hopefuly) are a little more scalable than forking. ---- I wanted to explore how Perl's closures can be manipulated, and ended up creating an object system by accident. -- Schemer Note: All code is untested, unless otherwise stated	[reply]
Re: Does Net::Ping have concurrency issues? by rob_au (Abbot) on May 30, 2003 at 23:04 UTC
I think you may find the node Time-Slice Concurrent Ping in which I posted some code that can be used to ping multiple hosts concurrently of interest ... with no threads, no forking and no external binaries. `perl -le 'print+unpack"N",pack"B32","00000000000000000000001001100011"'`	[reply]
Re: Does Net::Ping have concurrency issues? by Aristotle (Chancellor) on May 30, 2003 at 20:17 UTC
Without looking too far into your post, I think you should probably give fping a whirl instead of rolling your own. Makeshifts last the longest.	[reply]
Re: Re: Does Net::Ping have concurrency issues? by isotope (Deacon) on Jun 04, 2003 at 17:19 UTC
`fping` is S-L-O-W... this is the best run I had: `$ time /usr/local/sbin/fping -r1 -g 10.100.19.1/24 -a 2> /dev/null 10.100.19.1 10.100.19.50 10.100.19.85 10.100.19.87 10.100.19.200 real 0m18.816s user 0m0.010s sys 0m0.000s` [download] Maybe I did something wrong, but I could only get it down to one retry, and the default sure isn't zero, despite the --help claim. I tried Time-Slice Concurrent Ping with the modifications I posted in that thread and tuning the parameters (allow 254 outstanding pings, 1 second timeout), and this is what I get: `$ sudo time ./multiping.pl Reply time for 10.100.19.1 - 3.007 seconds Reply time for 10.100.19.50 - 3.000 seconds Reply time for 10.100.19.85 - 2.994 seconds Reply time for 10.100.19.87 - 2.993 seconds Reply time for 10.100.19.200 - 2.971 seconds PONG: 10.100.19.1 PONG: 10.100.19.50 PONG: 10.100.19.85 PONG: 10.100.19.87 PONG: 10.100.19.200 0.15user 0.02system 0:04.11elapsed 4%CPU (0avgtext+0avgdata 0maxreside +nt)k 0inputs+0outputs (394major+191minor)pagefaults 0swaps` [download] With this one, it's always less than 4.25 seconds, which is much more useful to me. --isotope	[reply] [d/l] [select]


Welcome to the Monastery
	PerlMonks