Beefy Boxes and Bandwidth Generously Provided by pair Networks
XP is just a number
 
PerlMonks  

Re^2: Fastest way to lookup a point in a set

by eyepopslikeamosquito (Archbishop)
on Aug 05, 2017 at 23:30 UTC ( [id://1196828]=note: print w/replies, xml ) Need Help??


in reply to Re: Fastest way to lookup a point in a set
in thread Fastest way to lookup a point in a set

On a side note: are you aware of multi-dim hashes in Perl?
No, I wasn't aware of them. Curiously, when I added them to my original test, they turned out to be slower than join. That is, Multi:
exists $cells->{$p->[0],$p->[1]} or die;
was slower than Str:
exists $cells->{ join ':', @{$p} } or die;
Benchmark results:
Str: 7 wallclock secs ( 6.58 usr + 0.00 sys =.58 CPU) @ 30404.38/s ( +n=200000) Multi: 8 wallclock secs ( 7.98 usr + 0.00 sys =.98 CPU) @ 25050.10/s ( +n=200000)

Update: though Multi was faster than:

exists $cells->{ join ':', $p->[0], $p->[1] } or die;

Replies are listed 'Best First'.
Re^3: Fastest way to lookup a point in a set
by LanX (Saint) on Aug 05, 2017 at 23:44 UTC
    > they turned out to be slower than join.

    Interesting, I think that's because you address the points separately in multi (2 lookups) while join can optimize over an array (one list returned)

    Maybe try to see how it compares with join over 2 elements.

    I wouldn't be surprised if the result is equally fast. Multi dimensional hashes are an old (Perl 4) but somehow obscure feature. They probably never optimized it over the equivalence to join.

    Cheers Rolf
    (addicted to the Perl Programming Language and ☆☆☆☆ :)
    Je suis Charlie!

      There is also the issue of locales and the radix char when non-integer values are used with multidimensional hash keys.

      If one has coordinates

      [5, 5.5] [5.5, 5 ]
      then using the multidimensional hash approach in a locale where both $; and the radix char are a comma would result in ambiguous keys:
      {5,5,5} {5,5,5}

      I know the original post specifies signed ints, so I'm just being more generic here. I also haven't checked if there is locale specific behaviour of $;

        From the docs: The default subscript separator is "\034", the same as SUBSEP in awk

        It's not "," , that's just the placeholder in Perl's syntax.

        I don't know and can't think of any number format or localization which uses chr(28) .

        Printable characters start from chr(32) onwards.

        update

        to avoid confusion

        hex, oct, dez

        perl -e "print 0x1c,034,28" 282828

        Cheers Rolf
        (addicted to the Perl Programming Language and ☆☆☆☆ :)
        Je suis Charlie!

Re^3: Fastest way to lookup a point in a set (updated)
by LanX (Saint) on Aug 06, 2017 at 00:39 UTC
    Maybe stringification is faster
    use Data::Dump qw/pp dd/; local $" = $; ; $p=[1,2]; $h{"@$p"}=12; $h{3,4}=34; dd %h;

    ("3\x1C4", 34, "1\x1C2", 12)

    no time to benchmark, sorry! (no pun intended ;-)

    update

    Probably is fastest to keep points all the time in that representation.

    Cheers Rolf
    (addicted to the Perl Programming Language and ☆☆☆☆ :)
    Je suis Charlie!

      Update: Regarding performance, stringification "@$p" reaches join(':',@$p) in Perl 5.22 and onwards. See this post.

      Hi LanX,

      Added lan_hash and lan_look, based on str_hash and str_look respectively. The stringification is a nice catch. Surprisingly, I didn't expect for it to run slower compared to join(':', @$p) and thought as fast or faster.

      sub lan_hash { # print "LanX_hash---------------\n"; my %cells; # insert the points into the hash for my $p (@points) { my $h = $p->[0] . ' ' . $p->[1]; my ( $x, $y ) = split ' ', $h; # print "x='$p->[0]' y='$p->[1]' h='$h' (x='$x' y='$y')\n"; if ($x != $p->[0]) { die; } if ($y != $p->[1]) { die; } $cells{$h} = undef; # ++$cells{$h}; } scalar(keys %cells) == $npoints or die; # lookup each points in the hash for my $p (@points) { my $h = $p->[0] . ' ' . $p->[1]; exists $cells{$h} or die; } exists $cells{'notfound'} and die; exists $cells{'notfound2'} and die; exists $cells{'notfound3'} and die; return \%cells; } sub lan_look { my $cells = shift; for my $p (@points) { exists $cells->{ "@$p" } or die; } exists $cells->{'notfound'} and die; exists $cells->{'notfound2'} and die; exists $cells->{'notfound3'} and die; } my $big_ref = big_hash(); my $lan_ref = lan_hash(); my $mat_ref = mat_hash(); my $pak_ref = pak_hash(); my $st2_ref = str_hash(); my $st3_ref = str_hash(); my $str_ref = str_hash(); timethese 200000, { Big => sub { big_look($big_ref) }, # $cells->{ ($p->[1] << 32) | ($ +p->[0] & 0xFFFFFFFF) } Lan => sub { lan_look($lan_ref) }, # $cells->{ "@$p" } Mat => sub { mat_look($mat_ref) }, # $cells->{ $_->[0] }{ $_->[1] } Pak => sub { pak_look($pak_ref) }, # $cells->{ pack "ii", $p->[0], +$p->[1] } St2 => sub { st2_look($st2_ref) }, # $cells->{ join(':', @$p) } St3 => sub { st3_look($st3_ref) }, # $cells->{ $str } # optimized Str => sub { str_look($str_ref) }, # $cells->{ $p->[0] .':'. $p->[1 +] } };

      Results from CentOS 7.3 VM running Perl 5.16.3.

      Regards, Mario

Re^3: Fastest way to lookup a point in a set
by oiskuu (Hermit) on Aug 07, 2017 at 17:32 UTC

    You did not answer his question though. Can you batch the queries or do you need to run them one by one?

    If batching is allowed, then LanX has more or less revealed the solution. The problem is then very similar to sorting, and the fastest way to sort is probably the usual radix sort. (Maybe with AVX512 the balance has tipped in favor of merge sort; but I haven't tested that.)

    Once you have binned/sorted the queries, the lookup is essentially a linear pass over the (sorted) dataset. Don't loop over queries, map them.

      Thanks for the tip. There is some scope for batching, I believe, and I hadn't thought of sorting. I'll write a new root node in the next day or so with details on the full problem.

      Update: the new root node: High Performance Game of Life

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1196828]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others chanting in the Monastery: (9)
As of 2024-04-18 15:42 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found