Beefy Boxes and Bandwidth Generously Provided by pair Networks
Do you know where your variables are?
 
PerlMonks  

Increasing CPU Usage/Decreasing Run Time

by NathanE (Beadle)
on Jul 25, 2005 at 21:42 UTC ( [id://477987]=perlquestion: print w/replies, xml ) Need Help??

NathanE has asked for the wisdom of the Perl Monks concerning the following question:

Although this is the opposite of many peoples typical problem, I need to increase the amount of CPU my program is using.

I have been tasked with converting an old VB6 program to perl. I have now completed it, and it is working fine (read as - the output is correct). My problem is that it is taking multiple times longer to run in PERL than the original was in VB6. Upon eyeing the process, I notice that it is only using an average of 3 - 4% of CPU while running.

Is there any way to force PERL to run faster by using more CPU time? The issue can be seen in this profiler output:

BerkeleyDB::AUTOLOAD has -1 unstacked calls in outer BerkeleyDB::__ANON__ has 1 unstacked calls in outer Total Elapsed Time = 261.6975 Seconds User+System Time = 14.43755 Seconds Exclusive Times %Time ExclSec CumulS #Calls sec/call Csec/c Name 46.1 6.667 6.667 124415 0.0001 0.0001 BerkeleyDB::Common::db_ge 41.0 5.924 15.515 1 5.9238 15.515 main::ProcessLog 15.0 2.167 8.834 124415 0.0000 0.0001 BerkeleyDB::_tiedHash::FE +TCH 3.89 0.561 0.561 22983 0.0000 0.0000 BerkeleyDB::Common::db_pu +t 1.36 0.197 0.757 22983 0.0000 0.0000 BerkeleyDB::_tiedHash::ST +ORE 0.11 0.016 0.016 3 0.0053 0.0053 vars::BEGIN 0.11 0.016 0.016 1 0.0160 0.0160 XSLoader::load 0.10 0.015 0.015 1 0.0150 0.0150 BerkeleyDB::Term::close_e +verything 0.10 0.015 0.015 5 0.0030 0.0030 BerkeleyDB::Hash::_db_ope +n_hash 0.00 - -0.000 1 - - DynaLoader::dl_load_file 0.00 - -0.000 1 - - DynaLoader::dl_undef_symb +ols 0.00 - -0.000 1 - - DynaLoader::dl_find_symbo +ls 0.00 - -0.000 1 - - DynaLoader::dl_install_xs +ub 0.00 - -0.000 1 - - BerkeleyDB::bootstrap 0.00 - -0.000 1 - - BerkeleyDB::constant

The main thing there being that only 14 cpu seconds out of 261 seconds of real run time is being used. This was done on a file significantly smaller than production, so I'm sure any of you can see how expensive this begins to become on time.

Any help on getting this moving at a more appropriate pace would be greatly appreciated. I am thinking about dividing the input file into multiple files and running multiple threads to speed this up, but would like to see some potential alternatives first.

Thanks!

Replies are listed 'Best First'.
Re: Increasing CPU Usage/Decreasing Run Time
by BrowserUk (Patriarch) on Jul 25, 2005 at 22:03 UTC

    Sounds like your accessing a large volume of data through a small buffer (thereby forcing lots of IO)?

    What options are you using when you create/open your DB?


    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
    "Science is about questioning the status quo. Questioning authority".
    The "good enough" maybe good enough for the now, and perfection maybe unobtainable, but that should not preclude us from striving for perfection, when time, circumstance or desire allow.
      Yeah - that would make sense as well. The only thing that contradicts that is that I'm using tied hashes. I've untied the actual database from them and let them run just as plain hashes, and still encountered the same problem.

      Here's how I open the BerkeleyDB hash tables though;
      tie %$file, "BerkeleyDB::Hash",
      -Filename => $file,
      -Flags => DB_CREATE
      or die "Cannot open file\n" ;
      ############ UPDATE #############

      I just re-ran, again, with those hashes untied. That's exactly what the problem is. It's taking too much IO to the hash table. Any idea how to speed this up? I'm thinking about a process that dumps to the hash table after all the hard processing is done.

      Comments/Suggestions?

        I've used DB_File rather than BerkeleyDB::Hash, but I assume that the options available are similar. They are mentioned for the different DB types here as a part of the DB_File docs. I would assume that you are using what DB_File refers to as a DB_File::HASHINFO. The parameters you probably need to consider varying are the cachesize, bsize & ffactor.

        However, the DB_File docs gives no information on how to vary these options for performance. Eg. An ever bigger cache does not always render better performance.

        Optimising the options requires a fairly keen understanding of the nature of your data and the usage patterns of your application.

        I did find that the documentation here, particularly section 2, was useful, but be prepared for doing a lot of experimentation.

        The best guide I found to performance tuning was this page. Unfortunately, much of the advice relies upon your having access to one or more of the Berkeley DB utilities, which I never located for Win32. None the less, the information on that page proved very useful as a guide to some trial & error testing.


        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
        "Science is about questioning the status quo. Questioning authority".
        The "good enough" maybe good enough for the now, and perfection maybe unobtainable, but that should not preclude us from striving for perfection, when time, circumstance or desire allow.
        perrin has written some great stuff on optimizing BerkeleyDB cache sizes and other IO related stuff. I'd suggest doing a Super Search on stuff written by perrin that also uses the word BerkeleyDB. There's a bunch in there and I'm not going to pre-filter it for you right now.
find the IO break
by Anonymous Monk on Jul 26, 2005 at 08:45 UTC
    Since it's not the CPU that slows down your program, it must be the filesystem IO. But since I don't know what IO your program generates, there's no way I can help. Other people already pointed at DB optimizations, but I wouldn't be surprised, if your program does some other kind of IO, that really slows it down.
Re: Increasing CPU Usage/Decreasing Run Time
by Anonymous Monk on Jul 26, 2005 at 09:17 UTC
    If there's a large discrepancy between your wall clock time and the sum of your user and system time, it (usually) means something else is being done. It could be that the CPU is given other programs a chance to run as well. It could be that the program is waiting for the disk. It could be it's waiting for the network.

    But whatever it is, it's not something that can be determined from here, with the output of the profiler. The profiler just shows the breakdown of how it spend the allotted CPU time - it doesn't given any indication why it didn't get the CPU for 247 seconds.

    If you have a real OS, like Solaris or HP-UX, you can use tools like dtrace(1) or glance, which will give you a lot of information.

      From a purely windows perspective, there are a lot of tools that can help as well.

      The very excellent Process Explorer from Sysinternals will allow you to see what perl.exe has threaded beneath it - I often find with lots of disk intesive operations that Perl itself takes up very little CPU, but the Kernel itself is running quite high (as it's that that's handling the disk IO).

      You may also want to try bumping up the thread priority (vanilla task manager can do that), but I doubt it will make much difference.

Re: Increasing CPU Usage/Decreasing Run Time
by hakkr (Chaplain) on Jul 26, 2005 at 07:42 UTC

    On linux/Unix you should be able to 'nice' the process to affect the scheduling priority of the process(how much cpu time).

    You usually have to be root to nice upwards and increase priority. So try 'nice -19 script.pl' the max value for nice is -20 so that should give your script priority over whatever is hogging cpu. You can nice processes via 'top' interface as well. update as pointed out below nice rather annoyingly uses -20 as the highest priority
      Firstly, the numbers for nice work the other way round, i.e. 20 means run at the lowest priority. Secondly, this can do no good -- programs don't slow down to be awkward so that a nice priority can persuade them to hurry up or something -- there is always some other reason for it.

      In this case the process is spending too much time waiting for database interactions to complete and change of priority cannot change that behaviour. But if it hadn't been that it would have been some other I/O activity.

      Finally, on unix, 20 or more is equivalent to 19, which is the maximum-valued i.e. lowest priority.

      One world, one people

        I know programs don't slow down to be awkward but they do slow down if another process with a higher priority is hogging the cpu.
Re: Increasing CPU Usage/Decreasing Run Time
by qq (Hermit) on Jul 26, 2005 at 22:05 UTC

    Try also dprofpp -r. This will show the "real" times of the subroutines, and show you where those 200+ seconds are used.

Re: Increasing CPU Usage/Decreasing Run Time
by techcode (Hermit) on Jul 26, 2005 at 14:40 UTC
    I don't know why no one mentioned this - but maybe you're using wrong code?

    Just a though ... but if you literally translated from VB to Perl it's probably not optimised for Perl way of doing things ...

      But that wouldn't explain the low CPU usage, would it?

      CountZero

      "If you have four groups working on a compiler, you'll get a 4-pass compiler." - Conway's Law

        Well honestly I don't know. Computers are strange little boxes - and with them (even more than with other things) - the more you know about them, the more you realise how little you know.

        Maybe Perl itself is so optimized :) Just kidding of course. Unfortunately I (still) don't have much experience with code benchmarking - so I wrote that as just an idea ... a different point of view on the problem.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://477987]
Approved by Tanktalus
Front-paged by Tanktalus
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others cooling their heels in the Monastery: (3)
As of 2024-04-25 05:11 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found