Beefy Boxes and Bandwidth Generously Provided by pair Networks
more useful options

Re^2: Optimizing a large project.

by dextius (Monk)
on Jun 12, 2008 at 21:05 UTC ( #691800=note: print w/replies, xml ) Need Help??

in reply to Re: Optimizing a large project.
in thread Optimizing a large project.

Heh, I agree with your first point. Some of it is flexibility, some of it is for code-reuse, some is claimed to be because of "readability" (smaller functions being easier to digest than larger ones). On the note of compile time safety: We have absolutely no margin for error, we need to be 100% sure that the code will fail to compile if someone mistakenly types a variable (use strict) or an attribute name. We deal in milliseconds, so any extra handling of a routine would just slow us down.

Replies are listed 'Best First'.
Re^3: Optimizing a large project.
by dragonchild (Archbishop) on Jun 12, 2008 at 21:24 UTC
    You're saying that use warnings FATAL => 'all'; with eval wrappers is a bad plan? You're saying that your test suite isn't adequate to exercise what has been changed?

    Frankly, saying "We absolutely have no margin for error" is screaming "We need to have better plans for runtime error-handling." Perl can only do so much at compile-time. Are you saying that you don't have a simple hash-access with a variable? Nothing along the lines of if ( $foo{$bar} ) { ... }? Every single access is hard-coded? If that sort of safety is of such a major concern, you are using the wrong language.

    That said, milliseconds is normal. Most webapps have a response time of more than 1000 requests/second. Have you considered using some sort of tied hash that would restrict which keys can be used? There's several implementations on CPAN for that sort of thing. The cost of tying is about 15% total. Almost nothing for your response time requirements.

    And, error-handling isn't optional. Fast is worse than worthless if it's incorrect.

    My criteria for good software:
    1. Does it work?
    2. Can someone else come in, make a change, and be reasonably certain no bugs were introduced?
      The cost of tying is about 15% total

      Where did you get this measure?

      While developing Tie::Array::Packed I did some measurements and found my tied arrays (I know, we are talking about hashes here, but I don't thing there is going to be any significant difference) to be around 15 times slower (that is, 1400% performance penalty) than regular ones, and we are talking about a highly optimized XS implementation of the tie interface with almost no logic behind. The equivalent pure Perl implementation Tie::Array::PackedC was 60 times slower.

      What this penalty represents inside the whole application is completely dependent on how often the tied objects are accessed. I don't thing you can claim it is 15% or anything else, it just depends!

        That's funny. I was trying to find the reference and I can't. It's a number that's been in my head for years. Could be wrong. Huh.

        My criteria for good software:
        1. Does it work?
        2. Can someone else come in, make a change, and be reasonably certain no bugs were introduced?
      I'm sorry, this isn't a web application (where you have multiple servers, cores and apache instances). This is a single threaded application, some of the most critical code in the system has been profiled to run in microseconds.

      Are we using the wrong language? It's funny you say that, it's been a topic around here recently. I talk to the C++ guys we work with and they tell me we'd be insane to try and move our business logic to C++, it is just too hard to manage long term. The only other options are Java, or C# (I hope to avoid going down that path). Perl meets and exceeds our expectations of what is possible of a scripting language, this post was an attempt to try and take "it" to the next level, based on what we thought MIGHT be the issues.

      I humbly disagree with your assessment of our approach on compile time safety for attribute lookup, if you'd like to further discuss this, feel free to drop me a line (ryan-dot-dietrich-at-gmail-dot-com)
Re^3: Optimizing a large project.
by mr_mischief (Monsignor) on Jun 17, 2008 at 20:14 UTC
    Smaller subs are indeed easier to read than larger ones. However, it's often the case that the calling code is easier to read if an entire thought is finished by a sub rather than just a portion of a thought.

    Having more methods to call to do the same amount of work clearly adds to your call overhead. Don't make a method do several things, but don't be afraid to have it do all of one thing either. If a task takes three fairly simple steps and always takes the same three steps, those should be serialized in the same method. Don't call a method and pass the return value twice. Just call the method, have it do all three, and work with the one return.

    If you're reusing code because several areas actually use substantially similar logic, then that's good. If you're reusing code that has lots of conditionals in it to make it reusable, then it may be better not to do that. A one-line method that's called by, say, eight different modules as a utility method makes sense if it's exactly the same line needed for each -- but if that's the case then those modules are probably redundant in some other ways, too, if they have that much in common.

    Larger buffers often lower accumulated I/O times, but don't spend so much more CPU time managing the buffers that you lose the advantage of having them larger. If you can, consider storing any data being input and output, at least in intermediate steps, in as compact a form as you can.

    Be sure your main bottleneck isn't thrashing your swap. I've sped some tasks up greatly simply by makign sure the machine had adequate memory free at run time.

    Don't discount the importance of your environment. Sometimes I/O on a system is slow because you're competing with yourself. A system, for example, that has heavy data access and heavy logging and has its data and its logs on the same disk spindle is begging for the logs to be configured somewhere else. That's generally easy to do on any Unix-type system. Things like disabling or modifying the atime semantics, raising the delay on journal writes, or using a different type of filesystem can make worlds of difference, too.

    Since none of us know the specifics of your rather large project, none of us can give anything but generic advice. Test everything before you trust a suggestion to work in your specific case. Sometimes the biggest performance gains are from doing something unexpected because of the peculiarities of a project.

Log In?

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://691800]
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others studying the Monastery: (1)
As of 2022-05-18 04:32 GMT
Find Nodes?
    Voting Booth?
    Do you prefer to work remotely?

    Results (68 votes). Check out past polls.