Re: Increasing CPU Usage/Decreasing Run Time
by BrowserUk (Patriarch) on Jul 25, 2005 at 22:03 UTC
|
Sounds like your accessing a large volume of data through a small buffer (thereby forcing lots of IO)?
What options are you using when you create/open your DB?
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
"Science is about questioning the status quo. Questioning authority".
The "good enough" maybe good enough for the now, and perfection maybe unobtainable, but that should not preclude us from striving for perfection, when time, circumstance or desire allow.
| [reply] |
|
Yeah - that would make sense as well. The only thing that contradicts that is that I'm using tied hashes. I've untied the actual database from them and let them run just as plain hashes, and still encountered the same problem.
Here's how I open the BerkeleyDB hash tables though;
tie %$file, "BerkeleyDB::Hash",
-Filename => $file,
-Flags => DB_CREATE
or die "Cannot open file\n" ;
############ UPDATE #############
I just re-ran, again, with those hashes untied. That's exactly what the problem is. It's taking too much IO to the hash table. Any idea how to speed this up? I'm thinking about a process that dumps to the hash table after all the hard processing is done.
Comments/Suggestions?
| [reply] |
|
I've used DB_File rather than BerkeleyDB::Hash, but I assume that the options available are similar. They are mentioned for the different DB types here as a part of the DB_File docs. I would assume that you are using what DB_File refers to as a DB_File::HASHINFO. The parameters you probably need to consider varying are the cachesize, bsize & ffactor.
However, the DB_File docs gives no information on how to vary these options for performance. Eg. An ever bigger cache does not always render better performance.
Optimising the options requires a fairly keen understanding of the nature of your data and the usage patterns of your application.
I did find that the documentation here, particularly section 2, was useful, but be prepared for doing a lot of experimentation.
The best guide I found to performance tuning was this page. Unfortunately, much of the advice relies upon your having access to one or more of the Berkeley DB utilities, which I never located for Win32. None the less, the information on that page proved very useful as a guide to some trial & error testing.
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
"Science is about questioning the status quo. Questioning authority".
The "good enough" maybe good enough for the now, and perfection maybe unobtainable, but that should not preclude us from striving for perfection, when time, circumstance or desire allow.
| [reply] [d/l] [select] |
|
perrin has written some great stuff on optimizing BerkeleyDB cache sizes and other IO related stuff. I'd suggest doing a Super Search on stuff written by perrin that also uses the word BerkeleyDB. There's a bunch in there and I'm not going to pre-filter it for you right now.
| [reply] |
|
find the IO break
by Anonymous Monk on Jul 26, 2005 at 08:45 UTC
|
Since it's not the CPU that slows down your program, it must be the filesystem IO. But since I don't know what IO your program generates, there's no way I can help. Other people already pointed at DB optimizations, but I wouldn't be surprised, if your program does some other kind of IO, that really slows it down. | [reply] |
Re: Increasing CPU Usage/Decreasing Run Time
by Anonymous Monk on Jul 26, 2005 at 09:17 UTC
|
If there's a large discrepancy between your wall clock time and the sum of your user and system time, it (usually) means something else is being done. It could be that the CPU is given other programs a chance to run as well. It could be that the program is waiting for the disk. It could be it's waiting for the network.
But whatever it is, it's not something that can be determined from here, with the output of the profiler. The profiler just shows the breakdown of how it spend the allotted CPU time - it doesn't given any indication why it didn't get the CPU for 247 seconds.
If you have a real OS, like Solaris or HP-UX, you can use tools like dtrace(1) or glance, which will give you a lot of information. | [reply] |
|
From a purely windows perspective, there are a lot of tools that can help as well.
The very excellent Process Explorer from Sysinternals will allow you to see what perl.exe has threaded beneath it - I often find with lots of disk intesive operations that Perl itself takes up very little CPU, but the Kernel itself is running quite high (as it's that that's handling the disk IO).
You may also want to try bumping up the thread priority (vanilla task manager can do that), but I doubt it will make much difference.
| [reply] |
Re: Increasing CPU Usage/Decreasing Run Time
by hakkr (Chaplain) on Jul 26, 2005 at 07:42 UTC
|
On linux/Unix you should be able to 'nice' the process to affect the scheduling priority of the process(how much cpu time).
You usually have to be root to nice upwards and increase priority.
So try 'nice -19 script.pl' the max value for nice is -20 so that should give your script priority over whatever is hogging cpu.
You can nice processes via 'top' interface as well.
update as pointed out below nice rather annoyingly uses -20 as the highest priority
| [reply] |
|
Firstly, the numbers for nice work the other way round, i.e. 20 means run at the lowest priority. Secondly, this can do no good -- programs don't slow down to be awkward so that a nice priority can persuade them to hurry up or something -- there is always some other reason for it.In this case the process is spending too much time waiting for database interactions to complete and change of priority cannot change that behaviour. But if it hadn't been that it would have been some other I/O activity.
Finally, on unix, 20 or more is equivalent to 19, which is the maximum-valued i.e. lowest priority.
| [reply] |
|
I know programs don't slow down to be awkward but they do slow down if another process with a higher priority is hogging the cpu.
| [reply] |
|
Re: Increasing CPU Usage/Decreasing Run Time
by qq (Hermit) on Jul 26, 2005 at 22:05 UTC
|
Try also dprofpp -r. This will show the "real" times of the subroutines, and show you where those 200+ seconds are used.
| [reply] [d/l] |
Re: Increasing CPU Usage/Decreasing Run Time
by techcode (Hermit) on Jul 26, 2005 at 14:40 UTC
|
I don't know why no one mentioned this - but maybe you're using wrong code?
Just a though ... but if you literally translated from VB to Perl it's probably not optimised for Perl way of doing things ... | [reply] |
|
But that wouldn't explain the low CPU usage, would it?
CountZero "If you have four groups working on a compiler, you'll get a 4-pass compiler." - Conway's Law
| [reply] |
|
| [reply] |