Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl Monk, Perl Meditation
 
PerlMonks  

why filling Hash seems slower on unixes than windows ?

by vmpilgrim (Initiate)
on Jan 03, 2008 at 17:00 UTC ( [id://660247]=perlquestion: print w/replies, xml ) Need Help??

vmpilgrim has asked for the wisdom of the Perl Monks concerning the following question:

I am loading a 80000+ lines from a file in a hash ... i executed the same script in windows and linux,aix.. the script in windows (along with some other pre and post processing) took about 25 seconds ..and on linux it took 135 seconds !. I profiled the code and observed that on unix the loading of hash takes about 85% of time and on windows it took less than 50% ... Is this the default behaviour ..hash slower on unix ??

Replies are listed 'Best First'.
Re: why filling Hash seems slower on unixes than windows ?
by Old_Gray_Bear (Bishop) on Jan 03, 2008 at 18:27 UTC
    You said: "the script in windows (along with some other pre and post processing) took about 25 seconds ..and on linux it took 135 seconds". Is this based off of the wall-clock? Or did you actually benchmark the code (use Benchmark;)? Wall-clock times are notorious for varying all over the map, depending on what else is running on the machine.

    I had a 'Lead Programming Manager' once tell me that he'd "benchmarked some RPC code I wrote and it was giving him unacceptable response times" -- thirty seconds between the time he pressed the enter key and the time the browser got updated. After some investigation, I found the bug. It seems that there was a small error with his lap-top's DNS configuration. The first two DNS-servers in his authoritative list were non-existent, due to transposition errors, and the DNS time-out was set to 15 seconds. Once we fixed that, the response time he measured was sub-second. An actual benchmark gave 0.2 seconds from start to finish. (That was about his reaction time, pushing the start/stop timer button on his wrist-watch....)

    When you start saying things about program timings, you have to be sure that you are really measuring the code's performance and not side-effects of the other things on the Machine.

    ----
    I Go Back to Sleep, Now.

    OGB

      And even once you're sure you're comparing at least apples to oranges (rather than apples to giant squid) you need to account for environmental differences. As was alluded to above, environment can greatly affect timing on multiuser boxen.

      If you're comparing benchmark times on your desktop Wintendo box which is running nothing else concurrently against benchmark times from a shared CGI server at your ISP chances are you're not going to get "meaningful" results from the comparison (at least not for meaningful values of "meaningful" :). The box could be oversubscribed and you're getting 75 wall-clock seconds of paging activity on one side; you could have a 10k RPM disk on your desktop while the other box is a 7 year old 5400 RPM SCSI drive that's the 6th device on a busy cable. Maybe the directory the source data resides in is being pulled over NFS from across the country.

      So no, it's not impossible to imagine plausible scenarios where the numbers you quote are the result. It is fairly impossible to say why though without more concrete details. But underneath it all there's nothing in perl itself per se that's going to make hashes slower or faster on any one implementation (again, it's how that implementation interacts with its environment that's going to introduce variances).

      Update: And just to expand on "meaningful results", if your target box takes 135 seconds then your target box takes 135 seconds; comparing against your desktop number doesn't make the other system run any faster. Doesn't really make much sense to get your desktop system's time down if the implementation you use misbehaves where you really need it to run (say slurping entire files into RAM which works fine until you toss it on the other box that starts to thrash if you load more than ~2M of data in any one process). Look at your algorithm, how your processing the data, and why that's taking as long as it is on your target box and fix that instead.

      The cake is a lie.
      The cake is a lie.
      The cake is a lie.

Re: why filling Hash seems slower on unixes than windows ?
by gamache (Friar) on Jan 03, 2008 at 17:26 UTC
    Have you tried pre-sizing your hash by using keys in lvalue context?
    my %h; keys %h = 80000; ...
    This could give you a big performance boost on both platforms.
      Yes .. i used this and the performance of hash has increased a lot ...it now takes only 5/6 seconds to fill the hash....Thanks a lot..
Re: why filling Hash seems slower on unixes than windows ?
by jbert (Priest) on Jan 04, 2008 at 01:20 UTC
    On the same hardware that would be surprising. Do the two boxes have different CPUs? If so, could you give us more details?
      The development PC i use is a single CPU , 2GB ram , winXP, machine... and the target machine on which i test the script is a standalong workstation machine with 4 x86_64 cpus and perhaps 2gb+ ram ...and Suse linux and i also tested the same on a virtual machine with RedHat Linux, with 512 mb ram... anyways with the presizing of hash the problem is solved ...

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://660247]
Approved by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others having an uproarious good time at the Monastery: (9)
As of 2024-03-28 14:46 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found