Beefy Boxes and Bandwidth Generously Provided by pair Networks
We don't bite newbies here... much
 
PerlMonks  

RFC - Linux::TCPServer (new module)

by ph713 (Pilgrim)
on Oct 29, 2005 at 17:12 UTC ( [id://503887]=perlmeditation: print w/replies, xml ) Need Help??

This is the first module I've ever built that seems to have enough generic applicability that I'm going to upload it to CPAN. I'd like to get some feedback here on PM before publishing it to CPAN, especially on the namespace of the module. It implements an efficient and fairly advanced pre-forking tcp network server in C with callbacks to perl for module users. It will not work on anything but Linux, and beyond that requires perl 5.8.1+, kernel 2.4+, glibc 2.2.5+, and gcc 3.1+. Any feedback welcome.

The module tarball as I expect to send to it CPAN prior to any changes picked up from feedback here is currently downloadable from: http://www.dtmf.com/Linux-TCPServer-0.16.tar.gz

The largest issue at the moment (aside from the usual: finding and fixing any bugs, and best practices in terms of module file layout and documentation, etc) is the module's name. In perusing the CPAN modulelist, it seems like a ^Linux:: name is appropriate (of the 23 modules with linux in their names, only one is outside of ^Linux::). One monk who might well know better than me has suggested that it should perhaps be more of the forms TCPserver::Linux or Net::TCPserver::Linux.

I was also considering Linux::Net::TCPServer, but for some irrational reason that seems to imply to me that it's for networking between Linux machines only, which isn't true (it only runs on Linux, but it will communicate with anything). . . . ? ? ?

UPDATED Dec 15 2005: New version 0.16 is out, just in case anyone was watching this dead space :) , name still hasn't changed yet. I honestly feel a bit trapped by the whole naming issue. On CPAN, It doesn't belong where it fits and doesn't fit where it belongs, but I think it's useful and I'd still like to stick it out there, and I really don't forsee myself having the desire or time to add UDP support.

Replies are listed 'Best First'.
Re: RFC - Linux::TCPServer (new module)
by dragonchild (Archbishop) on Oct 29, 2005 at 19:04 UTC
    I would recommend Net::TCPserver::Linux as well. An important item to document will be what's so cool about yours that a pureperl solution doesn't do. if it's speed, i'd include benchmarks.

    My criteria for good software:
    1. Does it work?
    2. Can someone else come in, make a change, and be reasonably certain no bugs were introduced?
      There's some commentary in the .pod on how the code takes advantage of mmap() shared anonymous memory and lockless IPC for efficiency gains, but you're right, some good benchmarking versus, say, Net::Server::PreFork would be nice to have in there. I'll have to write up something to do the testing with.

      Update: It looks like Siege will be good for doing the testing. I'm writing up a test script that will do basic HTTP/1.0 responses to their benchmark and run under Linux::TCPServer or Net::Server::PreFork now, we'll see how it fares.

        FYI, I have some preliminary results, and it looks like I'm doing about 2-3x the connection handling speed of the pure perl competition depending on a lot of little variables. The results in the module distribution will of course have to include more details, and I'll leave the benchmarking script in the module too:

        Linux::TCPServer - 100 connections per child process:

        ** siege 2.64 ** Preparing 3 concurrent users for battle. The server is now under siege.. done. Transactions: 15000 hits Availability: 100.00 % Elapsed time: 6.93 secs Data transferred: 62.96 MB Response time: 0.00 secs Transaction rate: 2164.50 trans/sec Throughput: 9.08 MB/sec Concurrency: 2.72 Successful transactions: 15000 Failed transactions: 0 Longest transaction: 0.44 Shortest transaction: 0.00

        Linux::TCPServer - 1000 connections per child process:

        ** siege 2.64 ** Preparing 3 concurrent users for battle. The server is now under siege.. done. Transactions: 15000 hits Availability: 100.00 % Elapsed time: 7.64 secs Data transferred: 62.96 MB Response time: 0.00 secs Transaction rate: 1963.35 trans/sec Throughput: 8.24 MB/sec Concurrency: 2.82 Successful transactions: 15000 Failed transactions: 0 Longest transaction: 0.71 Shortest transaction: 0.00

        Net::Server::PreFork - 100 connections per child process:

        ** siege 2.64 ** Preparing 3 concurrent users for battle. The server is now under siege.. done. Transactions: 15000 hits Availability: 100.00 % Elapsed time: 19.89 secs Data transferred: 62.96 MB Response time: 0.00 secs Transaction rate: 754.15 trans/sec Throughput: 3.17 MB/sec Concurrency: 2.87 Successful transactions: 15000 Failed transactions: 0 Longest transaction: 0.75 Shortest transaction: 0.00

        Net::Server::PreFork - 1000 connections per child process:

        ** siege 2.64 ** Preparing 3 concurrent users for battle. The server is now under siege.. done. Transactions: 15000 hits Availability: 100.00 % Elapsed time: 14.92 secs Data transferred: 62.96 MB Response time: 0.00 secs Transaction rate: 1005.36 trans/sec Throughput: 4.22 MB/sec Concurrency: 2.61 Successful transactions: 15000 Failed transactions: 0 Longest transaction: 1.70 Shortest transaction: 0.00
        Artificial benchmarking has proved to be a wise path to go down indeed. It has uncovered some issues where I was leaking a little bit (either PerlIO objects or the perl stack in general, hard to tell which), that weren't apparent in my (rather strenuous I thought) real-world testing. An update to 0.14 is coming sometime Sunday that moves some of the leaky XS code regarding converting socket FDs into perl io objects back in perl where at least it works correctly, and a change in the handling of socket closing is pending too, as my original understanding of the whole orderly tcp shutdown issue was wrong (it turns out to be an very application-protocol-specific thing, so I'll leave that to the module users if they need it).
Re: RFC - Linux::TCPServer (new module)
by tirwhan (Abbot) on Oct 30, 2005 at 09:23 UTC

    I also think I like Net::TCPServer::Linux best. Regarding the documentation, I hesitate to say it (because it's great to see someone take the time to actually provide extensive documentation), but it's a bit much ;-). A lot of the information given in the .pod is on the internals of the modules, design decisions and general networking background. IMO the module documentation should be there foremost to describe the user interface, things that are important to the user of the module. I'm not saying that any of the information you give is useless, but perhaps some of it could be broken out into a separate document (or separate sections in the docs). As it is, valuable user information is mixed together with design decisions and it's a lot to read for someone who is not familiar with the module. YMMV.

    The example in the pod also seems a bit long and would IMO be better in a separate script in an example/ subdirectory. Examples in the pod should be general-purpose and preferably only show how the module itself works (and not include external database connections and the like). Again, YMMV, and I'll be interested to read what others think of this.


    Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it. -- Brian W. Kernighan

      Yeah you're probably right about all of that. The module was written out of personal need, so I guess I've always been coming at it as a user, and thinking in terms of how to best write code that uses the module when I'm writing the documentation. I think I will still keep most of the extraneous detailed stuff in another pod somewhere though. Basically, if your project *really* needs this module, chances are high you're going to want to know all of that stuff if you don't already (the network stuff that is).

      Actually, it's not worth making a seperate pod just to rehash socket-related manpages, curious users can figure that out for themselves

      The implementation details probably don't matter so much, I may just move them over to comments in the C code.

      I think I'll just kill the example in the docs and tell people to read the lib/Net/TCPServer/Linux.pm source for the example. It's basically the same thing minus the database stuff, and it's the default callback implementation that gets used by the test script, etc.

      ETA: That exact BWK quote has been in the back of my mind for the past couple of weeks, because I know I've been too clever for my own good in numerous places in the C source of this module. :)
Re: RFC - Linux::TCPServer (new module)
by ph713 (Pilgrim) on Nov 01, 2005 at 02:52 UTC

    Thanks for the input/testing all. I've decided to hold off publishing this to CPAN for now (and let this thread die out), pending the name change for sure, and potentially a couple of other big changes which affect the name change.

    First, I'm thinking perhaps it's best to change up the interface of the module such that it is 99% compatible with the interface of the pure perl Net::Server::* module heirarchy, and publishing it into that space as Net::Server::Linux. I don't think it would be so bad to intrude on that space without inheriting from the existing modules as long as I provide a nearly identical module interface to the user.

    And since I currently only support TCP, but there's not a really "clean" way to put "TCP" in the name in that namespace, perhaps I should just go ahead and find an intelligent way to support UDP as well, so that the protocol-agnostic "Net::Server::Linux" can be a little more appropriate. (Or at the very least, document that future support for UDP is planned within this module)

    So in all likelyhood, this will end up emerging as Net::Server::Linux with a heavily changed module interface and potentially UDP support sometime in the near future.

    Once again, thanks for all the input, it's been very helpful.

    And if you have a use for the existing incarnation of the module, feel free to use it as is for now - just beware that the name and the interface will change shortly.

      Make a general purpose IO::Epoll based server :-)

        Actually I didn't use epoll() for Linux::TCPServer. If I were making a little more flexible solution I would have though. For the most part, the significant speed and efficiency gains I got (I still use this module as-is for now in some other proprietary code I'm developing) came from primarily three things:

        • Single-port-ness. I only needed to listen to a single tcp port, and quite frankly I think most people using such a module are in the same boat. Single port can be implemented much more efficiently than mutli-port, because you can (at least on Linux, probably most others) just block on accept() in all the children on a shared socket the parent established before the forks without locking anything. It's a big win, and could be done as easily in perl as it was in C.
        • Doing this stuff in perl is just inherently an inefficient proposition, whereas doing it in C is inherently efficient if done right. One of my "lessons learned" from this experience is that apparently not many developers of perl and/or perl modules really look at the net effect of what they're doing with tools like strace. Coming from C-land, if you were to take a server like Net::Server::PreFork and run a tcp service over it and strace it, you would be shocked.

          And I'm not trying to pick on Net::Server::PreFork specifically, a lot of perl modules are like that. IO::Socket::INET is relevant and easier to pick on, although Linux::TCPServer doesn't handle replacing it (in my proprietary code, however, I did). You'd be amazed at the system call waste in IO::Socket::INET for a simple tcp connection. It does a real getpeername() system call twice every time it receives a packet (or was it sends, I can't remember now, it was a while back), for instance.

          Or the fact that it actually goes and reads /etc/protocols *two to three times in a row* on every socket object creation to figure out that TCP is protocol number 6 (which hasn't changed in like, decades, on any operating system). This includes incoming socket objects in a server off of an accept() call. If you process 50 connections a second, you're going to open /etc/protocols and scan it for "tcp" 100+ times a second. You don't notice this stuff in small simple applications, but when you're processing large volumes of network connections, it adds up. Some of this only manifests as a result of sloppy overly generic module coding combined with handling the leaky abstractions of perl, combined with attributes of the local system's C library and whatnot. But it's important to look at the big picture for major platforms, and Linux/glibc is definitely a major platform for perl.

        • Efficient direct access to shared memory to track and update child state. There are times (like this one) where you know that a very efficient solution for a problem can be made by creating a real shared memory array of integers and indexing it directly from multiple threads of execution. Perl doesn't offer any clean api for this, although a semi-portable XS module could offer it with only a slight efficiency loss. "use threads" + "my @array : shared" doesn't even come close to realizing this, as anyone more familiar than me with perl internals knows. I think Net::Server::PreFork ends up communicating over a socket to it's children, for example, because that's about the best you can hope for in generic platform-abstract perl-land.

        So in summary, what C had to offer over pure perl in this case was that it didn't egregiously waste system calls and disk i/o pointlessly for the purpose of ease of abstraction, and it allowed me direct access to real hardware shared memory arrays (which are available on many, many platforms, probably most that perl can run on), but epoll() had nothing to do with it really.

        UPDATE: Having written that and now reflected on the issues further as a result, one of the key efficiency problems for the perl socket infrastructure in general is to attempt to abstract all "sockets" to look alike. Just because they are all "sockets" at some API level does not mean that it's a good idea to abstract all sockets together into a single class hierarchy, or to treat them the same within perl itself. The world would be a better place if tcp, udp, raw ip, unix, and any other distinct flavor of socket were uniquely different types in the core perl code, and if modules were written seperately and specifically for each protocol. Doing it "right" for all of them in one generic chunk of code is damn near impossible. On top of that, the choice between udp, tcp, unix, raw ip, etc is a very big design decision for any socket user. You cannot arbitrarily switch socket types without rethinking and re-coding everything you do anyways, unless you're inviting bad design to begin with. Therefore there's not much gain from the abstraction. We have a case here of N things abstracted into a single interface which exhibit wildly different characteristics which always matter to the application at hand, as well as matter in terms of libc/kernel api code on the bottom side of perl.
Re: RFC - Linux::TCPServer (new module)
by CountZero (Bishop) on Oct 30, 2005 at 16:30 UTC
    Since it will run on Linux only, it seems to me that it should be called Linux::Net::TCPServer.

    Modules who run on Windows only, have their own Win32::xxx namespace.

    CountZero

    "If you have four groups working on a compiler, you'll get a 4-pass compiler." - Conway's Law

      No; they're named Win32::... because they're for interfacing to the Windows system itself, or some subsystem of it.

      We're building the house of the future together.

        I guess that gets at the heart of the matter.

        Just to be clear then, the standard is supposed to be that if the purpose of the module is to expose an OS/platform-specific interface to perl users, then it belongs in the ^Platform:: namespace, whereas if it implements a generic concept with the internals tailored to work on a certain OS/platform, then it belongs in the generic namespaces with the platform tacked on the end?

      I was originally of the Linux:: mind too (obviously), but everyone else so far has gone 3-0 in favor of Net::TCPserver::Linux. I was just about to give in and go start changing the name everywhere this morning when your dissenting opinion arrived, now you've given me an excuse to put it off for at least a few more hours and reconsider it some more :)

        Count one more voice for Net::Server::Linux, that way if someone gets around to adding a Win32 equivalent, it can be Net::Server::Win32, and so on for other platforms. When someone goes looking for a net server, they'll likely find what they are looking for in that namespace regardless of what platform they need it for.

        Win32::* (and by extension Linux::*) are (or should be) reserved for stuff that it simply makes no sense to try and port to other platforms.


        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlmeditation [id://503887]
Approved by TStanley
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others wandering the Monastery: (6)
As of 2024-04-16 08:15 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found