Increasing CPU Usage/Decreasing Run Time

NathanE has asked for the wisdom of the Perl Monks concerning the following question:

Although this is the opposite of many peoples typical problem, I need to increase the amount of CPU my program is using.

I have been tasked with converting an old VB6 program to perl. I have now completed it, and it is working fine (read as - the output is correct). My problem is that it is taking multiple times longer to run in PERL than the original was in VB6. Upon eyeing the process, I notice that it is only using an average of 3 - 4% of CPU while running.

Is there any way to force PERL to run faster by using more CPU time? The issue can be seen in this profiler output:

BerkeleyDB::AUTOLOAD has -1 unstacked calls in outer
BerkeleyDB::__ANON__ has 1 unstacked calls in outer
Total Elapsed Time = 261.6975 Seconds
  User+System Time = 14.43755 Seconds
Exclusive Times
%Time ExclSec CumulS #Calls sec/call Csec/c  Name
 46.1   6.667  6.667 124415   0.0001 0.0001  BerkeleyDB::Common::db_ge
 41.0   5.924 15.515      1   5.9238 15.515  main::ProcessLog
 15.0   2.167  8.834 124415   0.0000 0.0001  BerkeleyDB::_tiedHash::FE
+TCH
 3.89   0.561  0.561  22983   0.0000 0.0000  BerkeleyDB::Common::db_pu
+t
 1.36   0.197  0.757  22983   0.0000 0.0000  BerkeleyDB::_tiedHash::ST
+ORE
 0.11   0.016  0.016      3   0.0053 0.0053  vars::BEGIN
 0.11   0.016  0.016      1   0.0160 0.0160  XSLoader::load
 0.10   0.015  0.015      1   0.0150 0.0150  BerkeleyDB::Term::close_e
+verything
 0.10   0.015  0.015      5   0.0030 0.0030  BerkeleyDB::Hash::_db_ope
+n_hash
 0.00       - -0.000      1        -      -  DynaLoader::dl_load_file
 0.00       - -0.000      1        -      -  DynaLoader::dl_undef_symb
+ols
 0.00       - -0.000      1        -      -  DynaLoader::dl_find_symbo
+ls
 0.00       - -0.000      1        -      -  DynaLoader::dl_install_xs
+ub
 0.00       - -0.000      1        -      -  BerkeleyDB::bootstrap
 0.00       - -0.000      1        -      -  BerkeleyDB::constant
[download]

The main thing there being that only 14 cpu seconds out of 261 seconds of real run time is being used. This was done on a file significantly smaller than production, so I'm sure any of you can see how expensive this begins to become on time.

Any help on getting this moving at a more appropriate pace would be greatly appreciated. I am thinking about dividing the input file into multiple files and running multiple threads to speed this up, but would like to see some potential alternatives first.

Thanks!

Comment on Increasing CPU Usage/Decreasing Run Time Download Code

Replies are listed 'Best First'.
Re: Increasing CPU Usage/Decreasing Run Time by BrowserUk (Patriarch) on Jul 25, 2005 at 22:03 UTC
Sounds like your accessing a large volume of data through a small buffer (thereby forcing lots of IO)? What options are you using when you create/open your DB? Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal? "Science is about questioning the status quo. Questioning authority". The "good enough" maybe good enough for the now, and perfection maybe unobtainable, but that should not preclude us from striving for perfection, when time, circumstance or desire allow.	[reply]
Re^2: Increasing CPU Usage/Decreasing Run Time by NathanE (Beadle) on Jul 25, 2005 at 22:25 UTC
Yeah - that would make sense as well. The only thing that contradicts that is that I'm using tied hashes. I've untied the actual database from them and let them run just as plain hashes, and still encountered the same problem. Here's how I open the BerkeleyDB hash tables though; tie %$file, "BerkeleyDB::Hash", -Filename => $file, -Flags => DB_CREATE or die "Cannot open file\n" ; ############ UPDATE ############# I just re-ran, again, with those hashes untied. That's exactly what the problem is. It's taking too much IO to the hash table. Any idea how to speed this up? I'm thinking about a process that dumps to the hash table after all the hard processing is done. Comments/Suggestions?	[reply]
Re^3: Increasing CPU Usage/Decreasing Run Time by BrowserUk (Patriarch) on Jul 25, 2005 at 22:57 UTC
I've used DB_File rather than BerkeleyDB::Hash, but I assume that the options available are similar. They are mentioned for the different DB types here as a part of the DB_File docs. I would assume that you are using what DB_File refers to as a `DB_File::HASHINFO`. The parameters you probably need to consider varying are the `cachesize, bsize & ffactor`. However, the DB_File docs gives no information on how to vary these options for performance. Eg. An ever bigger cache does not always render better performance. Optimising the options requires a fairly keen understanding of the nature of your data and the usage patterns of your application. I did find that the documentation here, particularly section 2, was useful, but be prepared for doing a lot of experimentation. The best guide I found to performance tuning was this page. Unfortunately, much of the advice relies upon your having access to one or more of the Berkeley DB utilities, which I never located for Win32. None the less, the information on that page proved very useful as a guide to some trial & error testing. Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal? "Science is about questioning the status quo. Questioning authority". The "good enough" maybe good enough for the now, and perfection maybe unobtainable, but that should not preclude us from striving for perfection, when time, circumstance or desire allow.	[reply] [d/l] [select]
Re^3: Increasing CPU Usage/Decreasing Run Time by diotalevi (Canon) on Jul 26, 2005 at 03:36 UTC
perrin has written some great stuff on optimizing BerkeleyDB cache sizes and other IO related stuff. I'd suggest doing a Super Search on stuff written by perrin that also uses the word BerkeleyDB. There's a bunch in there and I'm not going to pre-filter it for you right now.	[reply]
Re^4: Increasing CPU Usage/Decreasing Run Time by tphyahoo (Vicar) on Jul 26, 2005 at 07:57 UTC
find the IO break by Anonymous Monk on Jul 26, 2005 at 08:45 UTC
Since it's not the CPU that slows down your program, it must be the filesystem IO. But since I don't know what IO your program generates, there's no way I can help. Other people already pointed at DB optimizations, but I wouldn't be surprised, if your program does some other kind of IO, that really slows it down.	[reply]
Re: Increasing CPU Usage/Decreasing Run Time by Anonymous Monk on Jul 26, 2005 at 09:17 UTC
If there's a large discrepancy between your wall clock time and the sum of your user and system time, it (usually) means something else is being done. It could be that the CPU is given other programs a chance to run as well. It could be that the program is waiting for the disk. It could be it's waiting for the network. But whatever it is, it's not something that can be determined from here, with the output of the profiler. The profiler just shows the breakdown of how it spend the allotted CPU time - it doesn't given any indication why it didn't get the CPU for 247 seconds. If you have a real OS, like Solaris or HP-UX, you can use tools like dtrace(1) or glance, which will give you a lot of information.	[reply]
Re^2: Increasing CPU Usage/Decreasing Run Time by puploki (Hermit) on Jul 26, 2005 at 09:35 UTC
From a purely windows perspective, there are a lot of tools that can help as well. The very excellent Process Explorer from Sysinternals will allow you to see what perl.exe has threaded beneath it - I often find with lots of disk intesive operations that Perl itself takes up very little CPU, but the Kernel itself is running quite high (as it's that that's handling the disk IO). You may also want to try bumping up the thread priority (vanilla task manager can do that), but I doubt it will make much difference.	[reply]
Re: Increasing CPU Usage/Decreasing Run Time by hakkr (Chaplain) on Jul 26, 2005 at 07:42 UTC
On linux/Unix you should be able to 'nice' the process to affect the scheduling priority of the process(how much cpu time). You usually have to be root to nice upwards and increase priority. So try 'nice -19 script.pl' the max value for nice is -20 so that should give your script priority over whatever is hogging cpu. You can nice processes via 'top' interface as well. update as pointed out below nice rather annoyingly uses -20 as the highest priority	[reply]
Re^2: Increasing CPU Usage/Decreasing Run Time by anonymized user 468275 (Curate) on Jul 26, 2005 at 08:13 UTC
Firstly, the numbers for nice work the other way round, i.e. 20 means run at the lowest priority. Secondly, this can do no good -- programs don't slow down to be awkward so that a nice priority can persuade them to hurry up or something -- there is always some other reason for it. In this case the process is spending too much time waiting for database interactions to complete and change of priority cannot change that behaviour. But if it hadn't been that it would have been some other I/O activity. Finally, on unix, 20 or more is equivalent to 19, which is the maximum-valued i.e. lowest priority. One world, one people	[reply]
Re^3: Increasing CPU Usage/Decreasing Run Time by hakkr (Chaplain) on Jul 26, 2005 at 09:50 UTC
I know programs don't slow down to be awkward but they do slow down if another process with a higher priority is hogging the cpu.	[reply]
Re^4: Increasing CPU Usage/Decreasing Run Time by anonymized user 468275 (Curate) on Jul 26, 2005 at 13:10 UTC
Re: Increasing CPU Usage/Decreasing Run Time by qq (Hermit) on Jul 26, 2005 at 22:05 UTC
Try also `dprofpp -r`. This will show the "real" times of the subroutines, and show you where those 200+ seconds are used.	[reply] [d/l]
Re: Increasing CPU Usage/Decreasing Run Time by techcode (Hermit) on Jul 26, 2005 at 14:40 UTC
I don't know why no one mentioned this - but maybe you're using wrong code? Just a though ... but if you literally translated from VB to Perl it's probably not optimised for Perl way of doing things ...	[reply]
Re^2: Increasing CPU Usage/Decreasing Run Time by CountZero (Bishop) on Jul 26, 2005 at 19:43 UTC
But that wouldn't explain the low CPU usage, would it? CountZero "If you have four groups working on a compiler, you'll get a 4-pass compiler." - Conway's Law	[reply]
Re^3: Increasing CPU Usage/Decreasing Run Time by techcode (Hermit) on Jul 27, 2005 at 20:00 UTC
Well honestly I don't know. Computers are strange little boxes - and with them (even more than with other things) - the more you know about them, the more you realise how little you know. Maybe Perl itself is so optimized :) Just kidding of course. Unfortunately I (still) don't have much experience with code benchmarking - so I wrote that as just an idea ... a different point of view on the problem.	[reply]