Re: performance of counting keys in a big hash

Replies are listed 'Best First'.
Re^2: performance of counting keys in a big hash by space_monk (Chaplain) on May 22, 2013 at 11:54 UTC
A good piece of research, however the possibility exists that the two results are related and generating the first one also pre-generates the second one. I suggest that tests like these are best done in separate methods on separate data just to be sure. `#!/usr/bin/perl use strict; use warnings; use Benchmark qw(:all) ; my $n = 100000; our %hash1; @hash1{1..$n} = 1 x $n; my $n2 = 100000; our %hash2; @hash2{1..$n2} = 1 x $n2; my $n3 = 1000000; our %hash3; @hash3{1..$n3} = 1 x $n3; cmpthese( 1000, { 'Method 1' => sub { my @k = keys %hash1; my $v = scalar @k; }, '100K' => sub { my $v = scalar keys %hash2; }, '1 Mill' => sub { my $v = scalar keys %hash3; } } );` [download] Results: `Rate Method 1 1 Mill + 100K Method 1 15.0/s -- -100% + -100% 1 Mill 999999999999999872/s 6645700000000000000% -- + 0% 100K 999999999999999872/s 6645700000000000000% 0% + --` [download] Note result appears to be same for 100k and 1Million entries in hash. If you spot any bugs in my solutions, it's because I've deliberately left them in as an exercise for the reader! :-)	[reply] [d/l] [select]
Re^3: performance of counting keys in a big hash by hdb (Monsignor) on May 22, 2013 at 12:13 UTC
This is a valuable comment, I had not thought of that. I have reversed the order and get the same results. Does this provide enough proof or could there be a caveat as well? To be really sure one would have to run separate scripts?	[reply]
Re^4: performance of counting keys in a big hash by space_monk (Chaplain) on May 22, 2013 at 12:19 UTC
I'm just in the process of tidying up my comment by putting in some code that actually works, as opposed to that I first thought of! :-). I think reversing the tests is probably enough proof; I don't believe separate scripts would prove anything extra. I am using Benchmark to compare my results instead of using manual timing. If you spot any bugs in my solutions, it's because I've deliberately left them in as an exercise for the reader! :-)	[reply]
Re^2: performance of counting keys in a big hash by demerphq (Chancellor) on May 22, 2013 at 13:16 UTC
Use the force Luke! `/* hash structure: / / This structure must match the beginning of struct xpvmg in sv.h. / struct xpvhv { HV xmg_stash; /* class package / union _xmgu xmg_u; STRLEN xhv_keys; / total keys, including placeholders +/ STRLEN xhv_max; / subscript of last element of xhv_ar +ray / };` [download] and / the number of keys (including any placeholders) / #define XHvTOTALKEYS(xhv) ((xhv)->xhv_keys) / * HvKEYS gets the number of keys that actually exist(), and is provid +ed * for backwards compatibility with old XS code. The core uses HvUSEDK +EYS * (keys, excluding placeholders) and HvTOTALKEYS (including placehold +ers) / #define HvKEYS(hv) HvUSEDKEYS(hv) #define HvUSEDKEYS(hv) (HvTOTALKEYS(hv) - HvPLACEHOLDERS_get( +hv)) #define HvTOTALKEYS(hv) XHvTOTALKEYS((XPVHV) SvANY(hv)) #define HvPLACEHOLDERS(hv) (Perl_hv_placeholders_p(aTHX_ MUTABLE +_HV(hv))) #define HvPLACEHOLDERS_get(hv) (SvMAGIC(hv) ? Perl_hv_placeholders_ge +t(aTHX_ (const HV )hv) : 0) [download] HvKEYS(hv) is the thing that implements `scalar keys %hash` --- $world=~s/war/peace/g	[reply] [d/l] [select]
Re^3: performance of counting keys in a big hash by hdb (Monsignor) on May 22, 2013 at 13:23 UTC
I think I will have to carry you around the swamps of some unknown planet for much longer before I will ready to dive into Perl's source itself to answer questions in this community...	[reply]
Re^4: performance of counting keys in a big hash by demerphq (Chancellor) on May 22, 2013 at 13:44 UTC
:-) Seriously tho, its not as hard as people think. In fact a lot of the time its easy. All of the code for perl's hash implementation lives in three files, hv.h, hv_func.h, hv.c. hv_func.h is the home of the new hash functions if you want to look at the options. --- $world=~s/war/peace/g	[reply]
Re^3: performance of counting keys in a big hash by space_monk (Chaplain) on May 22, 2013 at 14:55 UTC
Dammit Jim, I'm a doctor, not a C programmer! :-) _{ok it's a little cross genre, but what the hell} If you spot any bugs in my solutions, it's because I've deliberately left them in as an exercise for the reader! :-)	[reply]


Think about Loose Coupling
	PerlMonks