http://qs321.pair.com?node_id=691728

dextius has asked for the wisdom of the Perl Monks concerning the following question:

I'm doing some research on how I can optimize our Perl application. It's fairly large (100k lines). I've ran both dprof and NYTimes to try and fix some of the outliers, but we think that there are some things we're not seeing (based on the output).

If anyone can comment on the following, we'd greatly appreciate it.

First, some background: Our application is separated into many libraries (almost 1k) and within those libraries, many many methods (over 3k total).

1. We found this: http://www.perl.com/pub/a/2000/06/dougpatch.html ... (we realize this is very old, and things have probably changed, but I'd better ask)... We're worried that method call overhead isn't taken into account in any of our profiling runs. I couldn't find any resolution to this thread either. If anything, we'd gladly take a pragma to lock in inheritance to get the performance boost. We call a LOT of methods in our system, if we can make calling them faster, it would make a difference.

2. We're using "fields", since we desperately need compile time safety for attribute lookups, and I'm told that it's faster than using a hash directly, because they become array lookups. We're researching the move to Perl 5.10 as well, because of a purported speed boost as well as a decreased memory footprint. Lastly, we heard that there is a change to fields past 5.9, and 5.10 has a "new" way of doing this type of stuff altogether, but I haven't seen any benchmarks or documentation detailing the implementations and any rationale for using/not using them.

3. We use SWIG to talk to shared memory. Does anyone have any idea the difference in performance between hand rolling XS over SWIG for reading attributes off a C Structure? (or is there some even "better" way to do this kind of thing that I've never heard of?)

4. I read the section of perldoc concerning inline functions. I'm a little confused, you can't have an inline function if it has any arguments? I need to run some benchmarks of my own on this, but it's not something I see mentioned in any of my Perl books. (Is memoization of a method call even possible?)

Update:

The dprofpp -r produced some interesting results (1.5 hours of data collected).



Based on this, we definitely have some more work to do. Thanks to everyone with their insight on the issue.

Replies are listed 'Best First'.
Re: Optimizing a large project.
by salva (Canon) on Jun 12, 2008 at 17:45 UTC
    3. We use SWIG to talk to shared memory. Does anyone have any idea the difference in performance between hand rolling XS over SWIG for reading attributes off a C Structure? (or is there some even "better" way to do this kind of thing that I've never heard of?)

    SWIG wrappers are very inefficient!

    The way it uses a tie interface to expose C struct attributes, and the double wrapping of objects it performs, makes it more than an order of magnitude slower than an equivalent hand-crafted XS interface (note that I am talking exclusively about the wrapping, obviously, the C functions behind, take the same time no matter if you use SWIG or XS!).

      Awesome, thanks for your input... /me breaks open copy of "Extending and Embedding Perl"..

        I'm no XS or Inline::C guru. Take the following comments with a grain (or more) of salt.

        I've used Inline::C a few times and it is pretty slick for quickly hacking together a bit of C code and Perl. Couldn't be any easier.

        I understand that I::C autogenerates XS code behind the scenes. I have seen multiple recommendations here that a good way to get up and running with XS is to use I::C to generate a basic chunk of XS and then switch to pure XS for production use.

        I've never needed to take this final step with my I::C hackery, but based on other reports it's worth investigating.

        Some other tools are mentioned in this thread.


        TGI says moo

Re: Optimizing a large project.
by perrin (Chancellor) on Jun 12, 2008 at 18:51 UTC
    First, is the problem that it takes too long, or that it uses too much CPU? If it's that it takes too long, you need to profile wall time, not CPU time. If your application does any significant I/O, that's almost certainly the time bottleneck, and you won't see it when profiling CPU time. Until you've done this, forget about method call overhead and inline functions.

    Next, stop using fields. The feature you're referring to was called pseudohashes and it was killed a long time ago (5.8 I think) because it caused problems and didn't deliver on the expected improvements. If you want hashes that don't autovivify keys, use Hash::Util::lock_keys(). If you're not on 5.8 yet, it would be a good move to get there.

      Our processes do not run and shut down. They start up in the morning, process all events given to it, and then shut down in the afternoon. We're measuring the response time between the timestamp of the event and when we responded to it.

      We do have fairly significant IO, both to disk and the network. The files are buffered writes, so we don't think that's our problem.

      Wow. Fields is that bad huh? I'll look into lock_keys. I need to re-read the 5.9 fields note on the perldoc, I think they mentioned the moved away from pseudohashes..
        We're measuring the response time between the timestamp of the event and when we responded to it.
        So, that would be wall time. Profile with wall time then.
        We do have fairly significant IO, both to disk and the network. The files are buffered writes, so we don't think that's our problem.
        If you never measured it, you don't really know. I/O is nearly always the problem on modern hardware, unless you're doing 3D rendering or something similarly compute intensive.
        To know where the problem lies, if in IO or in CPU usage, all you need to do is to run top or any other similar monitoring tool while your program is running. Roughly, if it shows that CPU usage goes over 90%, then your problem is CPU, below 30% it is IO and in the middle it is both.

        Well, this method will not work if your application just receives a few events per minute that are processed in less than a second, below the top (and your eyes!) resolution.

Re: Optimizing a large project.
by dragonchild (Archbishop) on Jun 12, 2008 at 19:44 UTC
    A few thoughts:
    • If you have things as libraries, make those libraries into profilable objects. I would spend time combining libraries and refactoring away methods. In my experience, 1000 libaries and 3000 methods is someone too steeped in Java and less in realworld software engineering.
    • Why are you so concerned with compile-time safety? Wouldn't it be better to spend some time worrying about error-management?

    My criteria for good software:
    1. Does it work?
    2. Can someone else come in, make a change, and be reasonably certain no bugs were introduced?
      Heh, I agree with your first point. Some of it is flexibility, some of it is for code-reuse, some is claimed to be because of "readability" (smaller functions being easier to digest than larger ones). On the note of compile time safety: We have absolutely no margin for error, we need to be 100% sure that the code will fail to compile if someone mistakenly types a variable (use strict) or an attribute name. We deal in milliseconds, so any extra handling of a routine would just slow us down.
        You're saying that use warnings FATAL => 'all'; with eval wrappers is a bad plan? You're saying that your test suite isn't adequate to exercise what has been changed?

        Frankly, saying "We absolutely have no margin for error" is screaming "We need to have better plans for runtime error-handling." Perl can only do so much at compile-time. Are you saying that you don't have a simple hash-access with a variable? Nothing along the lines of if ( $foo{$bar} ) { ... }? Every single access is hard-coded? If that sort of safety is of such a major concern, you are using the wrong language.

        That said, milliseconds is normal. Most webapps have a response time of more than 1000 requests/second. Have you considered using some sort of tied hash that would restrict which keys can be used? There's several implementations on CPAN for that sort of thing. The cost of tying is about 15% total. Almost nothing for your response time requirements.

        And, error-handling isn't optional. Fast is worse than worthless if it's incorrect.


        My criteria for good software:
        1. Does it work?
        2. Can someone else come in, make a change, and be reasonably certain no bugs were introduced?
        Smaller subs are indeed easier to read than larger ones. However, it's often the case that the calling code is easier to read if an entire thought is finished by a sub rather than just a portion of a thought.

        Having more methods to call to do the same amount of work clearly adds to your call overhead. Don't make a method do several things, but don't be afraid to have it do all of one thing either. If a task takes three fairly simple steps and always takes the same three steps, those should be serialized in the same method. Don't call a method and pass the return value twice. Just call the method, have it do all three, and work with the one return.

        If you're reusing code because several areas actually use substantially similar logic, then that's good. If you're reusing code that has lots of conditionals in it to make it reusable, then it may be better not to do that. A one-line method that's called by, say, eight different modules as a utility method makes sense if it's exactly the same line needed for each -- but if that's the case then those modules are probably redundant in some other ways, too, if they have that much in common.

        Larger buffers often lower accumulated I/O times, but don't spend so much more CPU time managing the buffers that you lose the advantage of having them larger. If you can, consider storing any data being input and output, at least in intermediate steps, in as compact a form as you can.

        Be sure your main bottleneck isn't thrashing your swap. I've sped some tasks up greatly simply by makign sure the machine had adequate memory free at run time.

        Don't discount the importance of your environment. Sometimes I/O on a system is slow because you're competing with yourself. A system, for example, that has heavy data access and heavy logging and has its data and its logs on the same disk spindle is begging for the logs to be configured somewhere else. That's generally easy to do on any Unix-type system. Things like disabling or modifying the atime semantics, raising the delay on journal writes, or using a different type of filesystem can make worlds of difference, too.

        Since none of us know the specifics of your rather large project, none of us can give anything but generic advice. Test everything before you trust a suggestion to work in your specific case. Sometimes the biggest performance gains are from doing something unexpected because of the peculiarities of a project.

Re: Optimizing a large project.
by Herkum (Parson) on Jun 12, 2008 at 17:46 UTC

    It sounds like you are trying to swallow a whale, you will end up choking on it if you do that.

    You need to spend time breaking down the libraries into stuff that is smaller and easier to work with. Then you focus more on optimization.

    If you don't have a handle on your processes, your code, you will not be able to optimize it for performance, at least not in any sane way.

      None of the method calls show up as huge consumers of time (other than some the SWIG logic).

      I'm concerned that if we broke the libraries down any further then we'd just exacerbate our method call overhead issue.

      Yes, it is a whale. I believe we have a handle on it, given two large scale profiling runs, I simply don't know how to measure things beyond what the profiler provides. (if that is even our issue to begin with).

        I guess the next question to ask is how do manage your data? Can simplify your data structures, you can limit your data sets? Can you move data process to something that would be more optimized like a database?

        Optimized code and data structures can only take you so far.

Re: Optimizing a large project.
by grinder (Bishop) on Jun 12, 2008 at 19:49 UTC
    We call a LOT of methods in our system, if we can make calling them faster, it would make a difference.

    In 5.10 you can deploy different method (lookup) resolution strategies. Download a copy of 5.10.0 and look at MRO (perldoc mro). You'll also find information in Class::C3. I have no idea if this will actually help you: you'll have to set up a testbed and try it out.

    • another intruder with the mooring in the heart of the Perl

Re: Optimizing a large project.
by Zen (Deacon) on Jun 12, 2008 at 18:24 UTC
    Did you set a performance goal to reach? That will help you prioritize whether going after 3 easier fixes of small benefit vs 1 hard fix of medium benefit is the correct strategy.
      Yes, we have multiple sets of metrics capable of detailing our improvements, once we have made them. We know where we need to be, and are willing to do all the tasks (and more) if that is what it takes to get to our goals.
Re: Optimizing a large project.
by tsee (Curate) on Jun 13, 2008 at 14:14 UTC

    "We call a LOT of methods in our system, if we can make calling them faster, it would make a difference."

    There is a TODO item in the perltodo about a potential speed-up of the entersub op which needs to be evaluated on every sub call:

    entersub-XS-vs-Perl

    I had a brief look at it and while I'm not exactly a core hacker, that task seems rather difficult. It's also no fun, so you most likely won't find somebody to do it in his or her spare time. My gut feeling is that you'd have to either try to do that within your organization or try to find a perl core hacker willing to tackle the issue and pay him for it.

    A cheaper but certainly tiny optimization might be to replace accessor methods a la sub get_foo { $_[0]->{foo} } with use Class::XS getters => [qw(foo)];. (Same thing for setters, of course.) Unfortunately, that won't fly if you want to do any checking in your accessor methods. Then, you could "steal" the corresponding XS from the aforementioned module and use it as a template to roll your own.

    Cheers,
    Steffen

Re: Optimizing a large project.
by BrowserUk (Patriarch) on Jun 12, 2008 at 22:24 UTC

      Please expand.

        Expand in what way?

        Seems the OPs not interested and as it would take a substantial amount of effort on my behalf to elaborate, I won't. It's not something that you can really describe in the abstract.

        I'd need one or two critical class implementations definitions to operate upon, and that means profiling. And if they are proprietary, then they would need sanitizing. Ie. Change class, method, attribute, and parameter names to garbage so their purpose is obscured.


        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.