File Server for cgi scripts

I have a web site with lots of perl code that does various common things such as generate html using a templating system and append user-supplied data to text files. I've been doing this the obvious way, for example a typical script may:

open a template file, read from the file, close the file, open a data file, read from the file, append to the file, close the file.

While going through a general code rewrite, I thought of setting up a file client/server based approach in which I create a server that reads the data and template files upon startup then binds to a port. The web code then acts as a client, requesting data or sending updates to the server as the case may be. Thus, template files only need to be read once. It seems that this is better than storing it all in shared Apache memory because then every Apache process doesn't need to know every file for every script/module.

I coded a test client and server using Net::EasyTCP and it works well. Bashing it with apache bench gives a worse median response time but a better mean response time than the original non-client/server version of the code.

My questions are:

Is this a good idea? (It seems so to me, but am I missing something.)
Am I re-inventing the wheel?
Unix sockets vs. TCP? (My attempts at writing unix socket code met with failure and Net::EasyTCP made writing the server and client code very easy, so I just went with that.)

Comment on File Server for cgi scripts

Replies are listed 'Best First'.
Re: File Server for cgi scripts by mortis (Pilgrim) on Aug 12, 2004 at 02:40 UTC
If you have the opportunity to use mod_perl with Apache as has been mentioned, I'd like to suggest that you look into the PerlRequire. PerlRequire allows you to specify a startup script (typically called startup.pl). In that script I usualy list as many of the modules (.pm) files, as use/require statements, that I'll be using from within the server environment as possible. I also write my sites as modules instead of straight CGIs for a variety of reasons, this being one of them. mod_perl will execute that script at server startup, loading in much of the website. The importiant thing is that this is done before apache forks it's children, so much of the codebase can stay pre-loaded/pre-compiled in shared memory. Copy-on-write is a big win here. This makes for a slighly slower startup time, but the improvements in memory consumption and initial request performance make for better overall performance. With mod_perl turned all the way up (no PerlRunOnce, etc.), you'll get better performance at the cost of greater memory usage, though with the improved performance you can usualy tune down the number of servers. Another configuration with mod_perl/Apache is to put a non-mod_perl apache instance in front of the mod_perl instance and have the non-mod_perl server handle all of the static content requests (like images, pdfs, plain html files, etc.). That keeps the mod_perl servers in reserve to do the dynamic content generation. If you want to do this, I'd suggest looking into using Apache's RewriteEngine directives. Mandrake used to set up multiple instances of apache this way out of the box. hth, Kyle	[reply]
Re: File Server for cgi scripts by derby (Abbot) on Aug 11, 2004 at 23:25 UTC
It sounds like a simplistic approach to a trimmed down webservice. Now your approach may be just fine for your current (and future) needs but the devils in the details and if you ever need to expand, you will end up re-implementing apache. The route I take for trimmed down webservice is to run two different apache instances. One on the normal port 80 and another on a different port (or different machine). That way I can use the power of apache to scale that back-end server with well known and documented techniques. -derby	[reply]
Re: File Server for cgi scripts by tantarbobus (Hermit) on Aug 12, 2004 at 00:01 UTC
First off, are you using mod_perl because that will give you a much better performance increase than trying to optimise the reading of a file. Even if you are why not just let the OS handle deciding what files to keep in memory? And instead look for other bottlenecks in your code for which you will get a better return on your effort? If don't like the idea of trusting the OS to decide what files to cache, and you really want to make it faster because this is where the bottleneck is(which I doubt), you could just use a ramdisk/tmpfs and copy the files there before using them. That way you can get the extra performance while still keeping a simple interface in your script (open, read close) with fewer dependances. Although you will probably not get the performance increase that you were expecting because the OS is normally pretty darn good at caching files. You might also want to look into FastCGI http://www.fastcgi.com/ which allows you to run a cgi as a persistent process external to the webserver, and then the webserver connects to the CGI over TCP or a UDS.	[reply]
Re: File Server for cgi scripts by exussum0 (Vicar) on Aug 12, 2004 at 11:04 UTC
This sounds similar to an NFS server. Good idea? Depends on how it becomes a component in your general system. If it's a centralized place for doing stuff vs doing it over and over again, it could be good. Reinventing the wheel? As someone else posted, it sounds like a webservice. To me, it sounds like a souped up NFS service. * look up the issues on NFS before continuing! there are locking and timing issues to be concerned about! * TCP. I don't think there are unix sockets on windows. :) Bart: God, Schmod. I want my monkey-man.	[reply]
Re: File Server for cgi scripts by iburrell (Chaplain) on Aug 12, 2004 at 16:17 UTC
I doubt a separate server will improve performance. Modern OSes are smart about keeping files cached in RAM after they are read. The first time you read a template file, it is cached and all subsequent accesses are fast. The OS is even smart enough to update the disk cache when the file is written. The latency of talking to a server process is much higher. If the processing of the templates if complicated, there could be some advantage to caching the result. If you want a shared or persistent cache, storing it to disk makes the most sense. The disk cache will make read efficient and it is shared between all processes.	[reply]
Re: File Server for cgi scripts by perrin (Chancellor) on Aug 12, 2004 at 20:32 UTC
Net::EasyTCP uses a single process multiplexing model, which is pretty good. The downside is that it only handles one request at a time, so when it is busy running one of your functions, everyone else waits. I think you'll find this is a problem if you send in a lot of requests in parallel. The alternative is to fork, which would lose the only advantage it had over mod_perl/FastCGI.	[reply]
Re: File Server for cgi scripts by Gilimanjaro (Hermit) on Aug 13, 2004 at 11:03 UTC
Sounds to me like mod_perl plus HTML::Template with caching enabled may do most of what you need...	[reply]