http://qs321.pair.com?node_id=168210

dash2 has asked for the wisdom of the Perl Monks concerning the following question:

Gosh, it has been a while.

I'm back in the saddle, completing an open source project which I swore would be my last computer program. (We shall see.)

So, I am writing a distributed Game of Life. This required me to write two main layers to support it:

Net::Distributed, which is a generic toolkit for writing peer-to-peer programs (it is _not_ Yet Another Protocol, before you ask, but a toolkit for writing a specific class of programs)

Net::Distributed::Space, which specifically lets peers relate to each other as "neighbours" in a spatial universe. Each peer has a position and talks only to its neighbours (after some initial setting up). I think this provides for more interesting P2P architectures than the normal spiderwebby things: it provides for a clear division of responsibility.

My questions are about Net::Distributed::Space.

1. As you can imagine, getting every peer to know their position in a universe, and the net addresses and positions of their neighbours, is a tricky job which requires a lot of talking between peers. Sometimes, I want to slow down what a peer does - for example, if I am waiting to get some info from another peer, I may need to send out reminder messages, but I don't want to do this every second or the network will be swamped!

So I use a brake. Essentially, all Space objects have a main loop where they:

  1. receive messages,
  2. deal with them,
  3. run an internal processing routine.

Two possible types of brakes are

if ($self->{times_round}++ < $self->{wait_this_many_times}) { return; # brake by counting loops }

and

$self->{timer} ||= time; if (time - $self->{timer} < $self->{min_interval}) { return; # brake by counting seconds } $self->{timer} = time;

Does anyone have any experience on the relative merits/demerits of these two approaches?

2. All my testing is on one machine, using the loopback so that peers can pretend they are on separate machines. This is bad but good.

Bad because I am implementing a distributed "grid-shaped" system over a centralised "star-shaped" system (with lo in the center). This causes heavy traffic even with just a few peers!

Good because that heavy traffic lets me test my system under tough conditions.

Anyway, does anybody know of some good linux- (pref. KDE-) based tools to inspect the loopback? It would be useful for me to watch messages as they were actually transmitted and picked up.

3. Debugging this stuff is difficult. You can't use the debugger because with 10 peers firing off simultaneously, it would just be too slow. So I just insert loads of debugging statements and then tweak the debug level of particular peers or parts of the code. Has anyone got any creative solutions for debugging networked code in realtime?

Cheers. The final result will, of course, be posted on PM!

dave hj