Beefy Boxes and Bandwidth Generously Provided by pair Networks
go ahead... be a heretic

Perl Hash Performance Hits Brick Wall!

by BrianP (Acolyte)
on Aug 17, 2015 at 04:41 UTC ( #1138824=perlquestion: print w/replies, xml ) Need Help??

BrianP has asked for the wisdom of the Perl Monks concerning the following question:

This node falls below the community's threshold of quality. You may see it by logging in.

Replies are listed 'Best First'.
Re: Perl Hash Performance Hits Brick Wall!
by shmem (Chancellor) on Aug 17, 2015 at 06:26 UTC
    When passing the hash back to the calling function, the process stalled indefinitely but there was a great deal of memory usage in the background and one CPU saturated.

    First, please don't put your entire posts into <c></c> or <code><code> tags. These are reserved for code.

    Second, you are probably passing the hash back like that

    return %hash;

    which passes back a long, long, really long list of tuples (key/value pairs). Pass back a reference.

    return \%hash;

    See perlref.

    perl -le'print map{pack c,($-++?1:13)+ord}split//,ESEL'

      which passes back a long, long, really long list of tuples (key/value pairs). Pass back a reference.

      *cough* Its just a flat list, there are no tuples (k/v pairs), or its one big tuple

Re: Perl Hash Performance Hits Brick Wall!
by dave_the_m (Monsignor) on Aug 17, 2015 at 06:15 UTC
    I can't reproduce this (linux, 5.20.2):
    use Time::HiRes qw(time); my $n = $ARGV[0]; my %h; for (1..$n) { my $key = sprintf "%06d", $_; $h{$key} = 1; } my @a; my $t1 = time; @a = keys %h; my $t2 = time; printf "time = %.3f; secs per 1M keys: %.3f\n", ($t2-$t1), ($t2-$t1) / $n * 1_000_000;
    I get the following, which shows that the time per key is nearly constant:
    $ perl foo 1000000 time = 0.400; secs per 1M keys: 0.400 $ perl foo 2000000 time = 0.845; secs per 1M keys: 0.423 $ perl foo 4000000 time = 1.752; secs per 1M keys: 0.438 $ perl foo 8000000 time = 3.792; secs per 1M keys: 0.474 $ perl foo 16000000 time = 9.213; secs per 1M keys: 0.576


Re: Perl Hash Performance Hits Brick Wall!
by hardburn (Abbot) on Aug 17, 2015 at 18:19 UTC

    To echo what's already been said, please don't format your entire post in code blocks. That should be used for code and program output. I'm sure many people have passed on helping because it was hard for them to read and understand the problem.

    There seems to be a mismatch between your code segment and the example outputs. The first output says:

    CCS: Fsize 216913920, pix 36152320 ETime=28.648 min, =97.60%, Event 'CCS: Extract RGB2C keys' ETime=0.380 min, = 1.30%, Event 'CCS: Read_and_hash' ETime=0.325 min, = 1.11%, Event 'CCS: Write_RGBC' 97.60% -> CCS: Extract RGB2C keys 1.30% -> CCS: Read_and_hash 1.11% -> CCS: Write_RGBC Elapsed time = 29.58 min

    But the time_event() calls in the snippet are:

    &time_event('CCS: Counting RGB hash keys', \%e2at... &time_event('CCS: Extract RGB2C keys', \%e2at,... &time_event('CCS: >open Output file', \%e2at, &time_event('CCS: Write_RGBC', \%e2at,... &time_event('CCS: Close_RGBC', \%e2at, $debug*0); printf("CCS: %d bytes written to fn '$ofile'\n", -s $ofile);

    But most of these do not appear in your output. It's also not entirely clear from the code how time_event() works. I could see some possibilities of its functioning that would mean you're measuring the reading time from the file, when you think you're measuring the call to keys.

    "There is no shame in being self-taught, only in not trying to learn in the first place." -- Atrus, Myst: The Book of D'ni.

Re: Perl Hash Performance Hits Brick Wall!
by flexvault (Monsignor) on Aug 22, 2015 at 09:27 UTC


    I tried to stay out of this one, but 'what the heck!'

    The type of work your doing requires a deeper understanding of Perl than you want to spend time on. In your write-up you mention passing a hash from a subroutine back to the main program, but failed to include that in your limited code description. As others have already pointed out, passing a hashref to the subroutine would have avoided having to copy a 27MM key/value hash back to the main program. As you show below it looks like it's in the main program anyway.

    Tiny, active code segment ... $sr_len = sysread(IN, $buf, $bsize); # SysRead Length last if $sr_len == 0; while($buf) { $rgb=substr($buf, 0, 6, ''); # Nibble 6 bytes $rgb2c{$rgb}++; }
    But why didn't you build the array while building your hash???
    @rgb = keys %rgb2c; << 1 line takes 28.648 min
    The above code is probably not doing what you think. '@rgb' is not in any specific order. Here's where knowing how Perl allocates an array and a hash, you could have done the following ( untested code ):
    my $fsize = -s [your file]; ## Find out how big the image is? my $arrsize = $fsize / 6; ## Size of the array and hash my $counter = 0; my %rgb2c; keys %rgb2c = $arrsize; ## Allocate one large memory hash! my @rgb[$arrsize] = ''; ## Allocate one large memory array! while ( 1 ) { $sr_len = sysread(IN, $buf, $bsize); # SysRead Length last if $sr_len == 0; while($buf) { $rgb=substr($buf, 0, 6, ''); # Nibble 6 bytes $rgb2c{$rgb}++; $rgb[$counter} = $rgb; # Build array as you go along $counter++; } }

    At this point you have a hash for telling you the number of colors and an array that represents the exact image in 48 bit increments. By pre-allocating the hash and array you make only one call to the operating system for memory for each, instead of millions of calls.

    Spend a little more time learning Perl and using efficient algorithms, and you'll have tools that will make you proud.


    "Well done is better than well said." - Benjamin Franklin

Re: Perl Hash Performance Hits Brick Wall!
by Anonymous Monk on Aug 17, 2015 at 05:53 UTC
    There are no such things as brick walls, lalalalala

Log In?

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1138824]
Approved by Corion
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others perusing the Monastery: (2)
As of 2022-05-27 03:11 GMT
Find Nodes?
    Voting Booth?
    Do you prefer to work remotely?

    Results (94 votes). Check out past polls.