Net::Ping works fine in a loop over a list of targets, but does not seem to do what I want if I run several sessions simultaneously.
I'm trying to write a script to discover hosts on a local subnet. In the past, they were discovered with ping(1) to the subnet broadcast address like so:
my @lines = `ping $self->{broadcast_ip} -b -c2 2>&1 | grep \'bytes fro
+m\'`;
...with the IP addresses parsed out of those lines, etc.
However, some of the devices on the subnet do not respond to broadcast pings, but I would still like to discover them. They will all respond to direct pings. One approach would be to ping each possible address in the subnet. If nearly every address was occupied, this would take little time, however, this network is sparsely populated, so such an approach would spend minutes waiting for timeouts on unpopulated addresses.
I thought I'd be smart and fork off a client process for each address. In theory, I could compress the total time to one timeout period (which could be as little as 5 seconds, given a local subnet), plus a slight incremental time spent forking each child, plus a 25ms delay per child to prevent network flooding. This takes less than 20 seconds, even with an empty network, versus as long as 254 * 5 = 1270 seconds (more than 20 minutes) for doing sequential pinging.
That sure makes forking attractive.
Here's what I tried to do:
#!/usr/bin/perl -w
use strict;
use Net::Ping;
use Time::HiRes qw(usleep);
use Fcntl qw(:DEFAULT :flock);
use POSIX qw(tmpnam mkfifo);
my $master = $$;
# Create the FIFO
my $fifo;
do {
$fifo = tmpnam();
} until mkfifo($fifo, 0666);
# Generate list of hosts to ping
my @targets;
foreach my $target (84..87) {
push(@targets, '10.100.19.'.$target);
}
# Make the pinger
my $pinger = Net::Ping->new('icmp', 5);
# Fork off processes
my %kids;
foreach my $target (@targets) {
my $kid = fork;
if($kid) {
# Parent
$kids{$kid} = $target;
}
else {
# Child
print "Child $target started\n";
if($pinger->ping($target)) {
# PING! Throw it in the FIFO
print "Found $target!\n";
sysopen(FH, $fifo, O_WRONLY | O_APPEND)
or die "Can't open FIFO $fifo: $!\n";
print "Locking FIFO\n";
flock(FH, LOCK_EX)
or die "Can't lock FIFO $fifo: $!\n";
print "appending to FIFO\n";
print FH $target."\n";
close(FH);
}
else {
warn "$target: $!\n";
print "No response from $target\n";
}
$pinger->close();
print "Child $target exiting\n";
exit();
}
# Sleep 25 ms to prevent flooding
usleep(25_000);
}
# Cleanup processes and gather results
my @pings;
print "Parent opening FIFO\n";
sysopen(FIFO, $fifo, O_RDONLY | O_NONBLOCK)
or die "Can't open FIFO $fifo for reading: $!\n";
print "Parent looping over remaining kids\n";
while(%kids) {
print "Parent looping wait\n";
while((my $kid = wait()) > 0) {
print "Parent reaping $kids{$kid}\n";
delete($kids{$kid});
print "Parent reading fifo\n";
while(defined(my $line = <FIFO>)) {
chomp($line);
print "Parent got $line!\n";
push(@pings, $line);
}
}
}
print "Parent closing fifo\n";
close(FIFO);
foreach my $ping (@pings) {
print "PONG: $ping\n";
}
# Delete the FIFO whenever we exit
END {
if($$ == $master) {
unlink($fifo)
or die "Couldn't unlink FIFO $fifo: $!\n";
print "$$ unlinked $fifo\n";
}
}
To minimize the test case, I've set the targets to be 10.100.19.84 through 10.100.19.87, where .85 and .86 actually exist. The tcpdump output below shows the attempts to ping, along with successful replies from .85 and .86, but the script reports no response from anything.
09:33:29.275634 arp who-has 10.100.19.84 tell 10.100.19.1
09:33:29.305634 10.100.19.1 > 10.100.19.85: icmp: echo request (DF)
09:33:29.305634 10.100.19.85 > 10.100.19.1: icmp: echo reply
09:33:29.355634 10.100.19.1 > 10.100.19.86: icmp: echo request (DF)
09:33:29.355634 10.100.19.86 > 10.100.19.1: icmp: echo reply
09:33:29.395634 arp who-has 10.100.19.87 tell 10.100.19.1
09:33:30.275634 arp who-has 10.100.19.84 tell 10.100.19.1
09:33:30.395634 arp who-has 10.100.19.87 tell 10.100.19.1
09:33:31.275634 arp who-has 10.100.19.84 tell 10.100.19.1
09:33:31.395634 arp who-has 10.100.19.87 tell 10.100.19.1
If I increase the usleep delay from 25_000 us to 10_000_000 us, I'm essentially running it like a loop, and I get the replies for .85 and .86:
script output:
Child 10.100.19.84 started
10.100.19.84:
No response from 10.100.19.84
Child 10.100.19.84 exiting
Child 10.100.19.85 started
Found 10.100.19.85!
Child 10.100.19.86 started
Found 10.100.19.86!
Child 10.100.19.87 started
10.100.19.87:
No response from 10.100.19.87
Child 10.100.19.87 exiting
Parent opening FIFO
Parent looping over remaining kids
Parent looping wait
Parent reaping 10.100.19.87
Parent reading fifo
Parent reaping 10.100.19.84
Parent reading fifo
Locking FIFO
appending to FIFO
Child 10.100.19.85 exiting
Parent reaping 10.100.19.85
Parent reading fifo
Parent got 10.100.19.85!
Locking FIFO
appending to FIFO
Child 10.100.19.86 exiting
Parent reaping 10.100.19.86
Parent reading fifo
Parent got 10.100.19.86!
Parent closing fifo
PONG: 10.100.19.85
PONG: 10.100.19.86
28674 unlinked /tmp/fileMVZqps
tcpdump:
10:27:38.255634 arp who-has 10.100.19.84 tell 10.100.19.1
10:27:39.255634 arp who-has 10.100.19.84 tell 10.100.19.1
10:27:40.255634 arp who-has 10.100.19.84 tell 10.100.19.1
10:27:48.255634 10.100.19.1 > 10.100.19.85: icmp: echo request (DF)
10:27:48.255634 10.100.19.85 > 10.100.19.1: icmp: echo reply
10:27:51.665634 arp who-has 10.100.19.86 tell 10.100.19.1
10:27:51.665634 arp reply 10.100.19.86 is-at 0:0:50:b:b3:7f
10:27:53.255634 arp who-has 10.100.19.85 tell 10.100.19.1
10:27:53.255634 arp reply 10.100.19.85 is-at 0:0:50:b:b3:7f
10:27:58.275634 10.100.19.1 > 10.100.19.86: icmp: echo request (DF)
10:27:58.275634 10.100.19.86 > 10.100.19.1: icmp: echo reply
10:28:02.375634 arp who-has 10.100.19.1 tell 10.100.19.85
10:28:02.375634 arp reply 10.100.19.1 is-at 0:7:e9:9:8a:dd
10:28:08.295634 arp who-has 10.100.19.87 tell 10.100.19.1
10:28:09.295634 arp who-has 10.100.19.87 tell 10.100.19.1
10:28:10.295634 arp who-has 10.100.19.87 tell 10.100.19.1
My current hypothesis is that Net::Ping isn't smart enough to leave alone ICMP replies that don't belong to the current process, and that perhaps the other children are grabbing and discarding the replies that should have made it to the children pinging .85 and .86. Any thoughts?
--isotope