Beefy Boxes and Bandwidth Generously Provided by pair Networks
Think about Loose Coupling
 
PerlMonks  

Re: Re: Set Operators

by Notromda (Pilgrim)
on Oct 26, 2002 at 18:54 UTC ( [id://208264]=note: print w/replies, xml ) Need Help??


in reply to Re: Set Operators
in thread Set Operators

Funny that this question came up today, as I just had to figure this out on my own. I needed to exclude a list of items from a search, so I immediately made a list of the items:

my @items = qw (one two three etc);

but then I was stumped. Looping through the whole list every time to detect a match seems not to be an optimal method. And writing:

my %items = ( one=>1, two=>1, three=>1 );

doesn't feel very good either.

Map to the rescue!

my %items = map {$_ => 1} qw (one two three);

I'm still fairly new to map, so I'm kinda happy that I found that one on my own... Take note, fellow acolytes! :)

++jsegal

Replies are listed 'Best First'.
Re: Re: Re: Set Operators
by DaveH (Monk) on Oct 27, 2002 at 14:24 UTC

    The map operator might look nice, but it isn't always the most suitable operator to use. It is a very powerful tool, but IMHO it is not the best for this job.

    The classic idiom for this type of "set" operation is to use the 'x' (multiplication) operator, in combination with hash slices. e.g.

    my (%hash, @array); @array = ('one', 'two', 'three'); @hash{@array} = (1) x @array; if (exists $hash{'one'}) { .... }

    To demonstrate just how efficient this method is, I have written a small test case using Benchmark.pm. I've tried to include all the methods which have been shown in reply so far. I'm not guaranteeing that Perl hasn't optimised away some of the operations (which could skew the results), but I hope that it provides a fairly accurate measure of the different approaches. If anyone can see any glaring errors or ommisions, please reply.

    use Benchmark qw(timethese); my $iterations = 300_000; my @array = ( "best", 1..49, "average", 51..99, "worst" ); for my $search (qw( best average worst )) { print "Timing '$search' case:\n"; timethese($iterations, { 'map' => sub { my %hash = (); %hash = map { $_=>1 } @array; if ($hash{$search}) { 1 } }, 'xop' => sub { my %hash = (); @hash{@array} = (1) x @array; if ($hash{$search}) { 1 } }, 'for1' => sub { my %hash = (); for (@array) { $hash{$_} = 1 } if ($hash{$search}) { 1 } }, 'for2' => sub { for (@array) { if ($_ eq $search) { 1; last } } }, 'for3' => sub { if (do { my $f=0; $_ eq $search and ($f=1),last for @array; $f }) { 1 } }, 'grep' => sub { if (grep $search eq $_, @array) { 1 } }, }); } __END__ Timing 'best' case: Benchmark: timing 300000 iterations of for1, for2, for3, grep, map, xo +p... for1: 116 wallclock secs (117.60 usr + 0.00 sys = 117.60 CPU) @ + 2551.02/s (n=300000) for2: 1 wallclock secs ( 1.21 usr + 0.00 sys = 1.21 CPU) @ 24 +7933.88/s (n=300000) for3: 3 wallclock secs ( 2.36 usr + 0.00 sys = 2.36 CPU) @ 12 +7118.64/s (n=300000) grep: 27 wallclock secs (26.20 usr + 0.00 sys = 26.20 CPU) @ 11 +450.38/s (n=300000) map: 236 wallclock secs (234.47 usr + 0.00 sys = 234.47 CPU) @ + 1279.48/s (n=300000) xop: 93 wallclock secs (92.93 usr + 0.00 sys = 92.93 CPU) @ 32 +28.24/s (n=300000) Timing 'average' case: Benchmark: timing 300000 iterations of for1, for2, for3, grep, map, xo +p... for1: 111 wallclock secs (110.18 usr + 0.00 sys = 110.18 CPU) @ + 2722.82/s (n=300000) for2: 20 wallclock secs (18.35 usr + 0.00 sys = 18.35 CPU) @ 16 +348.77/s (n=300000) for3: 20 wallclock secs (18.29 usr + 0.00 sys = 18.29 CPU) @ 16 +402.41/s (n=300000) grep: 26 wallclock secs (26.09 usr + 0.00 sys = 26.09 CPU) @ 11 +498.66/s (n=300000) map: 220 wallclock secs (219.22 usr + 0.00 sys = 219.22 CPU) @ + 1368.49/s (n=300000) xop: 93 wallclock secs (92.83 usr + 0.00 sys = 92.83 CPU) @ 32 +31.71/s (n=300000) Timing 'worst' case: Benchmark: timing 300000 iterations of for1, for2, for3, grep, map, xo +p... for1: 109 wallclock secs (109.24 usr + 0.00 sys = 109.24 CPU) @ + 2746.25/s (n=300000) for2: 35 wallclock secs (35.65 usr + 0.00 sys = 35.65 CPU) @ 84 +15.15/s (n=300000) for3: 34 wallclock secs (34.06 usr + 0.00 sys = 34.06 CPU) @ 88 +07.99/s (n=300000) grep: 26 wallclock secs (26.19 usr + 0.00 sys = 26.19 CPU) @ 11 +454.75/s (n=300000) map: 222 wallclock secs (222.51 usr + 0.00 sys = 222.51 CPU) @ + 1348.25/s (n=300000) xop: 93 wallclock secs (93.05 usr + 0.00 sys = 93.05 CPU) @ 32 +24.07/s

    Cheers,

    -- Dave :-)

      $hash{@array} is wrong, you want @hash{@array}.

      But since you're (rightly) using exists, you can reduce that even further:

      @hash{@array} = (); Some quick testing shows this is about 25% faster:

      Makeshifts last the longest.

        Just caught that one myself! :-)

        I've updated my example - as it now shows, the 'xop' is not that much faster than just iterating over the array with a properly constructed 'for' loop. In fact, in this example, 'grep' has turned out to be the most consistent performer (but that might be because there is no overhead in creating a hash). Grep is probably fastest because it is coded directly in C, rather than including any extra Perl contructs. Just goes to show that you should always test and test again before you go proclaiming one method over another! :-) (Serves me right for trying to be a smart-ass!)

        Cheers,

        -- Dave :-)

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://208264]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others pondering the Monastery: (3)
As of 2024-04-25 17:15 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found