http://qs321.pair.com?node_id=168210

dash2 has asked for the wisdom of the Perl Monks concerning the following question:

Gosh, it has been a while.

I'm back in the saddle, completing an open source project which I swore would be my last computer program. (We shall see.)

So, I am writing a distributed Game of Life. This required me to write two main layers to support it:

Net::Distributed, which is a generic toolkit for writing peer-to-peer programs (it is _not_ Yet Another Protocol, before you ask, but a toolkit for writing a specific class of programs)

Net::Distributed::Space, which specifically lets peers relate to each other as "neighbours" in a spatial universe. Each peer has a position and talks only to its neighbours (after some initial setting up). I think this provides for more interesting P2P architectures than the normal spiderwebby things: it provides for a clear division of responsibility.

My questions are about Net::Distributed::Space.

1. As you can imagine, getting every peer to know their position in a universe, and the net addresses and positions of their neighbours, is a tricky job which requires a lot of talking between peers. Sometimes, I want to slow down what a peer does - for example, if I am waiting to get some info from another peer, I may need to send out reminder messages, but I don't want to do this every second or the network will be swamped!

So I use a brake. Essentially, all Space objects have a main loop where they:

  1. receive messages,
  2. deal with them,
  3. run an internal processing routine.

Two possible types of brakes are

if ($self->{times_round}++ < $self->{wait_this_many_times}) { return; # brake by counting loops }

and

$self->{timer} ||= time; if (time - $self->{timer} < $self->{min_interval}) { return; # brake by counting seconds } $self->{timer} = time;

Does anyone have any experience on the relative merits/demerits of these two approaches?

2. All my testing is on one machine, using the loopback so that peers can pretend they are on separate machines. This is bad but good.

Bad because I am implementing a distributed "grid-shaped" system over a centralised "star-shaped" system (with lo in the center). This causes heavy traffic even with just a few peers!

Good because that heavy traffic lets me test my system under tough conditions.

Anyway, does anybody know of some good linux- (pref. KDE-) based tools to inspect the loopback? It would be useful for me to watch messages as they were actually transmitted and picked up.

3. Debugging this stuff is difficult. You can't use the debugger because with 10 peers firing off simultaneously, it would just be too slow. So I just insert loads of debugging statements and then tweak the debug level of particular peers or parts of the code. Has anyone got any creative solutions for debugging networked code in realtime?

Cheers. The final result will, of course, be posted on PM!

dave hj

Replies are listed 'Best First'.
Re: networking over the loopback
by ferrency (Deacon) on May 21, 2002 at 19:24 UTC
    This sounds like a really interesting project. I'm not sure if the size of the "life" problem is really big enough to warrant the infrastructure you're creating for it, but it's definitely interesting as a proof of concept, in any case :)

    As for your questions:

    1. It sounds like your main problem in #1 is synchronizing your intercommunicating processes. Each process only wants to take "the next step" when it has results from all of its neighbors for the previous step.

    Both of your solutions are essentially "polling": running a relatively tight loop, checking for a condition until something interesting happens. This can suck up CPU time like crazy. It's much more efficient to tell the OS "I'm not doing anything right now; wake me when something interesting happens" so the kernel can give another process use of the CPU. You can do this in a few ways. The easiest way is to continue polling, but put a "sleep" in your loop. This will greatly slow down your process, since the shortest sleep time is 1 second. But it'll give other processes more CPU time. Easiest, not best.

    Depending on your design, it may be a Lot better to use select or IO::Select to wait for input from all of your peers. This is more difficult, but also much more efficient. I'd check the IO::Select perldoc and post a followup if you need more help with it.

    2. Does tcpdump let you inspect the loopback? If so, there are other tools such as trafshow that may give you similar but more readable information. I'm not really sure on this one.

    3. I'm also not sure what to do for debugging. However, I'd suggest creating a test environment and simplifying things while testing. Instead of running 10 interconnected processes, run one process which is connected to a parent test process, just to make sure it works with one peer. Then try having the parent test process open multiple connections to the child, and test that. Once you're sure the child works very well in a sandbox, spawn a bunch of them in an interconnected way and it should be much easier to ferret out the few remaining bugs.

    As for particular debugging tools: people seem to like  ddd a lot, and it looks cool from what I've seen, but I haven't spent the time to become very familiar with it. This is probably also very difficult to use with multiple processes.

    In any case, if you have all your processes log to the same file, and use file locking on that file so each child's output is sent all at once instead of being interrupted by other children, debugging would be a lot easier than trying to read a streaming STDERR.

    I hope this helps :)

    Alan

Re: networking over the loopback
by Rhose (Priest) on May 21, 2002 at 19:46 UTC
    Yes, tcpdump can indeed watch the loopback.

    tcpdump -i lo -w /tmp/loopback.log

    This will log all packets on lo to the file /tmp/loopback.log. You can then read/analyze them with:

    tcpdump -r /tmp/loopback.log 'filter stuff here'

    Update:

    I thought about this a little more this morning, and if you have the extra resources, and know a little about writing rules, you should be able to use an IDS (intrusion detection system) -- like Snort -- to monitor for/alert on specific events. Now that I think about it, an IDS could be a really good debugging tool for a project like you describe. (And, if you get to the point where you are testing on multiple system on a LAN, your IDS will still be able to help as long as it is located on the same segment.)

      And in the spirit of using Perl for reinventing wheels, a TCP packet dumper using Net::PcapUtils ... :-)

      use Net::PcapUtils; use NetPacket::IP; use NetPacket::TCP; use NetPacket::Ethernet qw/:types/; use strict; Net::PcapUtils::loop( sub { my ($arg, $header, $packet) = @_; my $ethernet = NetPacket::Ethernet->decode($packet); if ($ethernet->{'type'} == ETH_TYPE_IP) { my $ip = NetPacket::IP->decode($ethernet->{'data'}, $ether +net); my $tcp = NetPacket::TCP->decode($ip->{'data'}); print $ip->{'src_ip'}, ":", $tcp->{'src_port'}, " -> ", $ip->{'dest_ip'}, ":", $tcp->{'dest_port'}, "\n\n"; my @data = split //, $tcp->{'data'}; while (@data) { print "\t"; for (0..7) { print sprintf("%02x", shift(@data)), " "; } print "\n"; } print "\n"; } }, 'DEV' => 'lo' );

       

Re: networking over the loopback
by Fletch (Bishop) on May 21, 2002 at 21:12 UTC

    Ethereal's another good freely avialable packet sniffer. It seems to be able to watch the lo interface fine on my RH7.3 box.

Re: networking over the loopback
by shotgunefx (Parson) on May 22, 2002 at 08:15 UTC
    Well as far as a brake, you could use Timer::HiRes to get sleeps < 1 second if it's desired. You could even adjust it faster or slower depending on other factors if need be.



    -Lee

    "To be civilized is to deny one's nature."
Re: networking over the loopback
by dash2 (Hermit) on May 23, 2002 at 08:56 UTC
    Thank you all for your very helpful comments.

    Status at the moment is: I have a simple game of life working (where each peer only handles one GOL square - obviously this makes network traffic massive!) but need to make the Organizer (which tells peers where they are and who their neighbours are) more reliable. The Organizer is clientserver based at the moment - it is easier to have one peer marshalling all the others into position before they all start talking to each other - but it's nicely subclassed so a truly self-organizing space will be quite easy to write. (Essentially, when a newbie peer arrives,each peer "points" it to the next peer, until the newbie gets to the frontier and finds a free position; all peers know their universe size and don't let anyone get beyond it.)

    dave hj~