Beefy Boxes and Bandwidth Generously Provided by pair Networks
Think about Loose Coupling
 
PerlMonks  

Re: performance of counting keys in a big hash

by hdb (Monsignor)
on May 22, 2013 at 09:16 UTC ( [id://1034710]=note: print w/replies, xml ) Need Help??


in reply to performance of counting keys in a big hash

The documentation of keys says: "In scalar context, returns the number of keys or indices." Based on this I would say that keys has a shortcut to return the number of keys w/o building an array first. A little experiment seems to confirm this:

use strict; use warnings; use Time::HiRes qw/time/; my %hash; my $n = 1000000; @hash{1..$n} = 1 x $n; my $s = time(); my @k = keys %hash; print scalar @k, ": "; print time()-$s, "\n"; $s = time(); print scalar keys %hash, ": "; print time()-$s, "\n";

The second version is vastly faster.

Replies are listed 'Best First'.
Re^2: performance of counting keys in a big hash
by space_monk (Chaplain) on May 22, 2013 at 11:54 UTC

    A good piece of research, however the possibility exists that the two results are related and generating the first one also pre-generates the second one. I suggest that tests like these are best done in separate methods on separate data just to be sure.

    #!/usr/bin/perl use strict; use warnings; use Benchmark qw(:all) ; my $n = 100000; our %hash1; @hash1{1..$n} = 1 x $n; my $n2 = 100000; our %hash2; @hash2{1..$n2} = 1 x $n2; my $n3 = 1000000; our %hash3; @hash3{1..$n3} = 1 x $n3; cmpthese( 1000, { 'Method 1' => sub { my @k = keys %hash1; my $v = scalar @k; }, '100K' => sub { my $v = scalar keys %hash2; }, '1 Mill' => sub { my $v = scalar keys %hash3; } } );
    Results:
    Rate Method 1 1 Mill + 100K Method 1 15.0/s -- -100% + -100% 1 Mill 999999999999999872/s 6645700000000000000% -- + 0% 100K 999999999999999872/s 6645700000000000000% 0% + --
    Note result appears to be same for 100k and 1Million entries in hash.
    If you spot any bugs in my solutions, it's because I've deliberately left them in as an exercise for the reader! :-)

      This is a valuable comment, I had not thought of that. I have reversed the order and get the same results. Does this provide enough proof or could there be a caveat as well?

      To be really sure one would have to run separate scripts?

        I'm just in the process of tidying up my comment by putting in some code that actually works, as opposed to that I first thought of! :-).

        I think reversing the tests is probably enough proof; I don't believe separate scripts would prove anything extra.

        I am using Benchmark to compare my results instead of using manual timing.

        If you spot any bugs in my solutions, it's because I've deliberately left them in as an exercise for the reader! :-)
Re^2: performance of counting keys in a big hash
by demerphq (Chancellor) on May 22, 2013 at 13:16 UTC

    Use the force Luke!

    /* hash structure: */ /* This structure must match the beginning of struct xpvmg in sv.h. */ struct xpvhv { HV* xmg_stash; /* class package */ union _xmgu xmg_u; STRLEN xhv_keys; /* total keys, including placeholders +*/ STRLEN xhv_max; /* subscript of last element of xhv_ar +ray */ };

    and

    /* the number of keys (including any placeholders) */ #define XHvTOTALKEYS(xhv) ((xhv)->xhv_keys) /* * HvKEYS gets the number of keys that actually exist(), and is provid +ed * for backwards compatibility with old XS code. The core uses HvUSEDK +EYS * (keys, excluding placeholders) and HvTOTALKEYS (including placehold +ers) */ #define HvKEYS(hv) HvUSEDKEYS(hv) #define HvUSEDKEYS(hv) (HvTOTALKEYS(hv) - HvPLACEHOLDERS_get( +hv)) #define HvTOTALKEYS(hv) XHvTOTALKEYS((XPVHV*) SvANY(hv)) #define HvPLACEHOLDERS(hv) (*Perl_hv_placeholders_p(aTHX_ MUTABLE +_HV(hv))) #define HvPLACEHOLDERS_get(hv) (SvMAGIC(hv) ? Perl_hv_placeholders_ge +t(aTHX_ (const HV *)hv) : 0)

    HvKEYS(hv) is the thing that implements scalar keys %hash

    ---
    $world=~s/war/peace/g

      I think I will have to carry you around the swamps of some unknown planet for much longer before I will ready to dive into Perl's source itself to answer questions in this community...

        :-)

        Seriously tho, its not as hard as people think. In fact a lot of the time its easy. All of the code for perl's hash implementation lives in three files, hv.h, hv_func.h, hv.c.

        hv_func.h is the home of the new hash functions if you want to look at the options.

        ---
        $world=~s/war/peace/g

      Dammit Jim, I'm a doctor, not a C programmer! :-) ok it's a little cross genre, but what the hell

      If you spot any bugs in my solutions, it's because I've deliberately left them in as an exercise for the reader! :-)

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1034710]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others rifling through the Monastery: (7)
As of 2024-04-18 17:15 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found