Re: RFC - Linux::TCPServer (new module)

Thanks for the input/testing all. I've decided to hold off publishing this to CPAN for now (and let this thread die out), pending the name change for sure, and potentially a couple of other big changes which affect the name change.

First, I'm thinking perhaps it's best to change up the interface of the module such that it is 99% compatible with the interface of the pure perl Net::Server::* module heirarchy, and publishing it into that space as Net::Server::Linux. I don't think it would be so bad to intrude on that space without inheriting from the existing modules as long as I provide a nearly identical module interface to the user.

And since I currently only support TCP, but there's not a really "clean" way to put "TCP" in the name in that namespace, perhaps I should just go ahead and find an intelligent way to support UDP as well, so that the protocol-agnostic "Net::Server::Linux" can be a little more appropriate. (Or at the very least, document that future support for UDP is planned within this module)

So in all likelyhood, this will end up emerging as Net::Server::Linux with a heavily changed module interface and potentially UDP support sometime in the near future.

Once again, thanks for all the input, it's been very helpful.

And if you have a use for the existing incarnation of the module, feel free to use it as is for now - just beware that the name and the interface will change shortly.

Comment on Re: RFC - Linux::TCPServer (new module)

Replies are listed 'Best First'.
Re^2: RFC - Linux::TCPServer (new module) by Anonymous Monk on Nov 02, 2005 at 17:31 UTC
Make a general purpose IO::Epoll based server :-)	[reply]
Re^3: RFC - Linux::TCPServer (new module) by ph713 (Pilgrim) on Nov 06, 2005 at 02:44 UTC
Actually I didn't use epoll() for Linux::TCPServer. If I were making a little more flexible solution I would have though. For the most part, the significant speed and efficiency gains I got (I still use this module as-is for now in some other proprietary code I'm developing) came from primarily three things: Single-port-ness. I only needed to listen to a single tcp port, and quite frankly I think most people using such a module are in the same boat. Single port can be implemented much more efficiently than mutli-port, because you can (at least on Linux, probably most others) just block on accept() in all the children on a shared socket the parent established before the forks without locking anything. It's a big win, and could be done as easily in perl as it was in C. Doing this stuff in perl is just inherently an inefficient proposition, whereas doing it in C is inherently efficient if done right. One of my "lessons learned" from this experience is that apparently not many developers of perl and/or perl modules really look at the net effect of what they're doing with tools like strace. Coming from C-land, if you were to take a server like Net::Server::PreFork and run a tcp service over it and strace it, you would be shocked. And I'm not trying to pick on Net::Server::PreFork specifically, a lot of perl modules are like that. IO::Socket::INET is relevant and easier to pick on, although Linux::TCPServer doesn't handle replacing it (in my proprietary code, however, I did). You'd be amazed at the system call waste in IO::Socket::INET for a simple tcp connection. It does a real getpeername() system call twice every time it receives a packet (or was it sends, I can't remember now, it was a while back), for instance. Or the fact that it actually goes and reads /etc/protocols two to three times in a row on every socket object creation to figure out that TCP is protocol number 6 (which hasn't changed in like, decades, on any operating system). This includes incoming socket objects in a server off of an accept() call. If you process 50 connections a second, you're going to open /etc/protocols and scan it for "tcp" 100+ times a second. You don't notice this stuff in small simple applications, but when you're processing large volumes of network connections, it adds up. Some of this only manifests as a result of sloppy overly generic module coding combined with handling the leaky abstractions of perl, combined with attributes of the local system's C library and whatnot. But it's important to look at the big picture for major platforms, and Linux/glibc is definitely a major platform for perl. Efficient direct access to shared memory to track and update child state. There are times (like this one) where you know that a very efficient solution for a problem can be made by creating a real shared memory array of integers and indexing it directly from multiple threads of execution. Perl doesn't offer any clean api for this, although a semi-portable XS module could offer it with only a slight efficiency loss. "use threads" + "my @array : shared" doesn't even come close to realizing this, as anyone more familiar than me with perl internals knows. I think Net::Server::PreFork ends up communicating over a socket to it's children, for example, because that's about the best you can hope for in generic platform-abstract perl-land. So in summary, what C had to offer over pure perl in this case was that it didn't egregiously waste system calls and disk i/o pointlessly for the purpose of ease of abstraction, and it allowed me direct access to real hardware shared memory arrays (which are available on many, many platforms, probably most that perl can run on), but epoll() had nothing to do with it really. UPDATE: Having written that and now reflected on the issues further as a result, one of the key efficiency problems for the perl socket infrastructure in general is to attempt to abstract all "sockets" to look alike. Just because they are all "sockets" at some API level does not mean that it's a good idea to abstract all sockets together into a single class hierarchy, or to treat them the same within perl itself. The world would be a better place if tcp, udp, raw ip, unix, and any other distinct flavor of socket were uniquely different types in the core perl code, and if modules were written seperately and specifically for each protocol. Doing it "right" for all of them in one generic chunk of code is damn near impossible. On top of that, the choice between udp, tcp, unix, raw ip, etc is a very big design decision for any socket user. You cannot arbitrarily switch socket types without rethinking and re-coding everything you do anyways, unless you're inviting bad design to begin with. Therefore there's not much gain from the abstraction. We have a case here of N things abstracted into a single interface which exhibit wildly different characteristics which always matter to the application at hand, as well as matter in terms of libc/kernel api code on the bottom side of perl.	[reply]


go ahead... be a heretic
	PerlMonks