Beefy Boxes and Bandwidth Generously Provided by pair Networks
No such thing as a small change
 
PerlMonks  

Techniques to cache perl scripts into memory ?

by fx (Pilgrim)
on Dec 17, 2010 at 14:47 UTC ( [id://877648] : perlquestion . print w/replies, xml ) Need Help??

fx has asked for the wisdom of the Perl Monks concerning the following question:

Fellow Monks,

One of my systems is a large multi-user Linux setup with several hundred users frequently accessing a selection of utility scripts I've written in Perl. These aren't big or complicated scripts but at peak times the result can be several calls per second to several scripts.

Monitoring the resources of my machine I can see this is absolutely killing the disk. CPU and RAM seem to be ok. I've also tested running this from a Solaris platform (in case this was a Linux OS specific issue) but the same is observed there.

Perl and all modules are installed on a RAID1 pair of 15K 6G SAS disks. This is as far as I can go with the money available for hardware (no SSD I'm afraid!). I've also tested a single 15K disk (to remove any RAID overhead) but no joy.

We've had a similar problems before with standard command line tools and in those cases I statically compiled them and placed the resulting binaries in a RAM disk. The OS seemed to understand what I was after and the disk was hit a lot less. Of course, they were written in C and the same isn't really possible with Perl. I've tried "pp" but it doesn't seem to work out...

I've messed around with the idea of moving my utilities to a TCP socket based "service" (to save the massive number of short lived processes being started) and having users call into that. Problem there is that my users would have to rewrite a lot of their calling code and some basic performance testing didn't really blow me away.

I'm looking for some new suggestions to investigate. Has anyone met a similar issue? I know a good OS (and I'm counting Linux in that mix) should be caching regularly used things in memory anyway but I think I've simply got too many different little things being called for that to work. Any ideas as to how to cache Perl scripts into memory and leave my poor disk alone?........

fx, Infinity is Colourless

  • Comment on Techniques to cache perl scripts into memory ?

Replies are listed 'Best First'.
Re: Techniques to cache perl scripts into memory ?
by SuicideJunkie (Vicar) on Dec 17, 2010 at 15:08 UTC

    Why can't you just have an install of perl on that ramdisk again?

    For example, the perl directory on my PC (minus the \HTML and \EG), is only 150 megs, and that isn't much these days.

Re: Techniques to cache perl scripts into memory ?
by LanX (Saint) on Dec 17, 2010 at 15:37 UTC
    Sounds like you want to put your scripts into a deamon and map the calls to aliases or C-wrappers which do the communication with this service.

    But I'd rather go first for SuicideJunkie's suggestion to start perl from a ram disk.

    Far less trouble and brain needed... otherwise too many side-effects to be considered...

    UPDATE

    like

    • forking for simultaneous access
    • setting uid of the user
    • restart on fail
    • adjusting special vars and ENV
    • avoiding possible exploits

    The only overhead you could save is for the compilation (like in mod_perl or FastCGI), but does it worth the trouble?

    Cheers Rolf

Re: Techniques to cache perl scripts into memory ?
by mr_mischief (Monsignor) on Dec 17, 2010 at 21:14 UTC

    There are already some quite good suggestions here. I'll repeat some here and add a couple. Take the repetition as a sign I agree with SuicideJunkie, the Anonymous monk, LanX, and MidLifeXis and not that I failed to read their nodes.

    Disabling access time updates (noatime) is a good idea. If access times are in some way important to a process you have but you want better performance, the relatime option is available under Linux the past few years as well. It takes a much smaller performance hit than regular access time updates.

    You should have enough memory in buffers, as already mentioned, to help with a lot of this. This is one more reason I think access times are one of the culprits (because they hit the disk even if the file was buffered).

    A RAM disk for Perl might help. A RAM disk for the programs might help. Either one would help partly for the speed of initial access, although again buffering should already be helping here. Reading from the RAM disk would also help alleviate acces time updates, but again you can eliminate most (relatime) or all (noatime) of those anyway. Fitting both into a RAM disk would probably help more than one or the other, but one may make a much bigger difference than the other.

    If you're able to buy another spindle, especially another 15k spindle, then putting part of what you're accessing on that may help. When running systems with a lot of disk contention having the heaviest accesses to your directory hierarchy split onto file systems on different physical disks can make a huge difference.

    I had a mail server once that had its incoming spool (for both MX and outbound SMTP), outgoing spool (for sending outbound from the domains and to the POP3 server), its swap space, and its local copy of syslog output all on the same spindle when I inherited it from another admin. Whenever a piece of mail moved through the system, the log was updated. If a piece of mail was sent from within the customer base (it was a server for a small ISP we took over) to another customer, it would hit the disk coming in, when being virus scanned and spam scanned, when going to the outbound spool on its way to the POP3 server, and once or more for the syslog for each of those actions. In the short term, rather than waiting to order or build a new server, determine how to split the services, and make changes at the network level that would have to propagate, I just added another disk one Sunday morning about 3 AM. I kept the incoming spool, spam tests, and virus tests on the original disk. I moved the swap, the outgoing spool (which was rarely used except for network issues and MX servers for other domains being briefly unreachable), and the local copy of syslog output on another disk. That disk came from a spares shelf and may have been so fast at the time as 5400 or even 7200 RPM. That server went from nearly unresponsive with the drive access light for the primary storage drive being on constantly to running smoothly for another two or three months until we had a proper break-out plan for its services (which was actually an integration plan with our own server farm). This was a system I couldn't afford to babysit, because it sat in an unmanned data center half an hour away. I could babysit its fail-over server, but sending everything to the fail-over all the time is of course a bad idea.

    RAID 1 is great for data redundancy. It can help a fair bit with reading if your controller is that smart. It's always going to tie up both spindles for every write, though. If you're thrashing disks and part of that thrashing is cause by writes (like access time writes to inodes) then RAID 1 just assures you of thrashing two disks rather than one. Anything that's thrashing your disks and can be replaced readily (like your Perl system) could be on another disk in the machine that doesn't necessarily even need to be in RAID. Even if you downloaded perl from source and built it, you can back that build up off-server. If necessary, you could put two more drives in and make it live on another RAID 1 array. You might even put in three more drives and make it a RAID 5 array to sit beside your RAID 1 you have now. ("Beside" is of course conceptually. It doesn't need to be how the disks are actually arranged.) ;-)

    I know you've ruled out SSDs, but that may be premature, too. If you can afford 15k physical drives then you can probably afford small MLC SSDs. Your Perl system (without your programs that depend on it) would surely fit on a 32 GB SSD no matter how much you download from CPAN. Two cheap ones of those for a RAID 1 array cost about $100 from NewEgg right now. One of the tips for SSDs, though, is to use them with noatime to minimize writes. So there we're back to one of the original tips. If you can't afford to put $100 into your business-critical server, then you have some other serious issues as well.

    I don't mean to be rude, but some of your assertions seem to based on incorrect assumptions about disk and filesystem management. As I've alluded to several times already, your whole directory structure indeed does not need to live on one filesystem. You can have mount points pretty much anywhere you want. You're not even limited to what the Linux installers suggest such as /, /boot, /var, /home, and /usr for mount points. You could have a separate /etc and in some unusual cases that even makes sense. You can have /var/log as a mount point to separate it from the rest of the /var directory, and you could do the same thing with a spool directory for a mail server or a document root for a web server for example. Sometimes /opt is a good choice for a separate filesystem, or even /tmp (and sometimes /tmp makes sense as a RAM disk). Sometimes on systems primarily for a single user I'll put their home directory (for example /home/chris) on its own partition and maybe even on a separate drive. Sometimes /lib, /usr, or /usr/local makes a lot of sense to separate out, especially if you in general need something like atime on most of the system but don't want to pay the price for every time an executable is run or a library is opened. One of the sweetest things about all this is that you don't need to reinstall the whole OS to do it. You can put in a new disk, partition it, make file systems on it, copy the data to the new filesystem, delete it from the old directory, and make the directory a mount point in your filesystem table (/etc/fstab for any Linux I've ever seen). How to do that in depth is probably more suited to some venue other than PerlMonks, though.

Re: Techniques to cache perl scripts into memory ?
by chrestomanci (Priest) on Dec 17, 2010 at 21:12 UTC

    I don't think the problem is the I/O of loading those perl scripts, the perl executable or associated perl libraries.

    Normally, the Linux VM subsystem should automatically cache frequently read files in RAM, in order to speed up future reads. If you users are running the same perl script(s) thousands of times per day, then those scripts, and everything required to run them should be in RAM most of the time.

    The only situation I can think of where those files would not get cached, are either your system is under severe memory pressure. (Try running free and see how much swap is in use, it ought to be none or very little on a modern server system), or you or someone else has been tweaking the VM subsystem tunable parameters, and stuffed them up. I don't think either scenario is likely.

    As SuicideJunkie has suggested, you could create a RAM disc, and copy your perl install there. That would force your Kernel to keep the files in RAM, along with a load of other irrelevant files. I don't think that is a good idea. If your system is under memory pressure, the data will get moved to swap anyway, and if not it will make no difference.

    Instead I think you should look for other causes of your poor performance. Unfortunately, I can't come up with any constructive suggestions on that.

    Sorry I can't be positive help, but I do think you are chasing the wrong culprit.

Re: Techniques to cache perl scripts into memory ?
by MidLifeXis (Monsignor) on Dec 17, 2010 at 16:01 UTC

    Perhaps the Sticky bit on the perl binary (or even the library files - not the directories) might help. It is, however, OS specific.

    Update: This would not be my first choice, however, as I also believe that the OS should be able to do a better job, in most cases, of determining what to leave in memory. Setting this could actually be detrimental to overall performance of the machine. Instead, try installing on a ramdisk (Solaris /tmp is an example of this) or using something similar to FastCGI. See also persistant perl. Both of these suggestions have been mentioned elsewhere in the thread.

    --MidLifeXis

Re: Techniques to cache perl scripts into memory ?
by Anonyrnous Monk (Hermit) on Dec 17, 2010 at 15:40 UTC

    I'd agree with SuicideJunkie.  Also, I'm a bit surprised that disk IO appears to be the culprit here. Normally, the OS should buffer/cache read files in RAM (if there is enough free RAM) for the very purpose of being able to access them faster subsequently.

    You might also try telling the respective filesystem to not update inode access times (mount option noatime) — unless it's already set up that way, of course. I've found that this can provide quite a performance boost.

Re: Techniques to cache perl scripts into memory ?
by LanX (Saint) on Dec 17, 2010 at 16:21 UTC
    Just in theory:

    If your users are calling many scripts in a session you could also consider using wrappers to start the perl-binary only once.

    First start would compile all scripts (as subs) and at termination go into background and "freeze".

    The next wrapper-call would just need to find and "defreeze" the process and call the already compiled sub.

    The drawback would be the constant memory consumption of an independent perl process per user and like in mod_perl you have to be very prudent about any side-effects of your scripts.

    Cheers Rolf

    Update

    Seems like MLX's suggestion of "persistent perl" follows more or less these ideas ... :)

Re: Techniques to cache perl scripts into memory ?
by JavaFan (Canon) on Dec 17, 2010 at 23:18 UTC
    I would first investigate and see what the disk is being used for. Reads? Writes? Is it reading the perl binary over and over again? Modules? Program text? C-libaries? Swap? Some other questions I'd ask myself: Does the problem persist when you upgrade/downgrade the OS? Run a different OS? What else is running on the machine?
Re: Techniques to cache perl scripts into memory ?
by flexvault (Monsignor) on Dec 18, 2010 at 18:55 UTC

    These aren't big or complicated scripts but at peak times the result can be several calls per second to several scripts.

    Are there a sub-set of your scripts that are called more often?

    If you have this information or can get the information from logs, it might help you get a sub-set of the scripts and make them resident in ram. This information may also help you look for bottle necks with-in your own scripts. For example, a script the opens a temporary file on the same disk where it resides will not help your performance.

    I agree with the other suggestions offered you, but a better understanding of what is happening in your environment will help you solve the disk usage problem.

    Good Luck

    "Well done is better than well said." - Benjamin Franklin

Re: Techniques to cache perl scripts into memory ?
by sundialsvc4 (Abbot) on Dec 20, 2010 at 01:39 UTC

    My gut-feeling on all of this is ... “if, by now, the computer cannot ‘well-enough take care of itself,’ in these what-should-be familiar waters ...” then by now you really should be asking yourself, “why not?’   “What must I be doing wrong, here?”   “Why is what I am now doing, standing-out so terribly much from what tens-of-thousands of computer programmers before me chose to do?”