Beefy Boxes and Bandwidth Generously Provided by pair Networks
Your skill will accomplish
what the force of many cannot
 
PerlMonks  

Re: Removing elemets from an array

by karlgoethebier (Abbot)
on Dec 28, 2012 at 16:51 UTC ( [id://1010720]=note: print w/replies, xml ) Need Help??


in reply to Removing elemets from an array

Just for total completeness ;-)

#!/usr/bin/perl + use strict; use warnings; use Set::Scalar; use Data::Dumper; my @arcb = ( 450, 625, 720, 645 ); my @arca = ( 625, 645 ); my $s1 = Set::Scalar->new(@arcb); my $s2 = Set::Scalar->new(@arca); @arcb = @{ $s1 - $s2 }; print Dumper( \@arcb ); __END__ $VAR1 = [ 450, 720 ];

Regards, Karl

P.S.: No, i don't get paid for endorsement of Set::Scalar ;-) And yes, i'm repeating myself: Re: Is it possible to find the matching words and the percentage of matching words between two texts?

Update

#!/usr/bin/perl + use Benchmark qw ( :hireswallclock cmpthese timethese ); use strict; # use warnings; use Set::Scalar; use Data::Dumper; use List::Compare; our @arcb = ( 450, 625, 720, 645 ); our @arca = ( 625, 645 ); sub davido { our @arcb; our @arca; @arcb = do { my %exclude; @exclude{@arca} = (); grep { !exists $exclude{$_} } @arcb; }; # print "@arcb\n"; } sub karlgoethebier { our @arcb; our @arca; my $s1 = Set::Scalar->new(@arcb); my $s2 = Set::Scalar->new(@arca); @arcb = @{ $s1 - $s2 }; # print Dumper( \@arcb ); } sub Lotus1 { our @arcb; our @arca; my $lc = List::Compare->new( \@arca, \@arcb ); @arcb = $lc->get_Ronly; # print "arcb: @arcb\n"; } sub LanX { my ( %arcb, %arca ); our @arcb; our @arca; @arcb{@arcb} = (); @arca{@arca} = (); delete @arcb{@arca}; @arcb = keys %arcb; # print Dumper( \@arcb ); } sub linuxer { our @arcb; our @arca; my %unwanted; $unwanted{$_}++ for @arca; @arcb = grep { !$unwanted{$_} } @arcb; # print "@arcb\n"; } sub nithins { our @arcb; our @arca; foreach my $a (@arca) { for ( my $i = 0 ; $i < scalar(@arcb) ; $i++ ) { delete $arcb[$i] if ( $a == $arcb[$i] ); } } # print "@arcb"; } # karlgoethebier; # Lotus1; # LanX; # davido; # linuxer; # nithins; my $results = timethese( -10, { 'karlgoethebier' => 'karlgoethebier', 'Lotus1' => 'Lotus1', 'LanX' => 'LanX', 'davido' => 'davido', 'linuxer' => 'linuxer', 'nithins' => 'nithins', } ); cmpthese($results); __END__ Benchmark: running LanX, Lotus1, davido, karlgoethebier, linuxer, +nithins for at least 10 CPU seconds... LanX: 10.513 wallclock secs (10.51 usr + 0.00 sys = 10.51 CPU) @ +378147.76/s (n=3974333) Lotus1: 10.4951 wallclock secs (10.49 usr + 0.00 sys = 10.49 CPU) + @ 23020.02/s (n=241480) davido: 10.5748 wallclock secs (10.58 usr + 0.01 sys = 10.59 CPU) + @ 378623.51/s (n=4009623) karlgoethebier: 10.4698 wallclock secs (10.47 usr + 0.00 sys = 10 +.47 CPU) @ 9713.28/s (n=101698) linuxer: 10.4635 wallclock secs (10.46 usr + 0.00 sys = 10.46 CPU +) @ 434439.87/s (n=4544241) nithins: 10.612 wallclock secs (10.61 usr + 0.00 sys = 10.61 CPU) + @ 589604.05/s (n=6255699) Rate karlgoethebier Lotus1 LanX davido linux +er nithins karlgoethebier 9713/s -- -58% -97% -97% -9 +8% -98% Lotus1 23020/s 137% -- -94% -94% -9 +5% -96% LanX 378148/s 3793% 1543% -- -0% -1 +3% -36% davido 378624/s 3798% 1545% 0% -- -1 +3% -36% linuxer 434440/s 4373% 1787% 15% 15% +-- -26% nithins 589604/s 5970% 2461% 56% 56% 3 +6% --

"The first priority of Set::Scalar is to be a convenient interface to sets. While not designed to be slow or big, neither has it been designed to be fast or compact. Jarkko Hietaniemi <jhi@iki.fi>"

«The Crux of the Biscuit is the Apostrophe»

 

Replies are listed 'Best First'.
Re^2: Removing elemets from an array
by linuxer (Curate) on Dec 30, 2012 at 17:58 UTC

    I hope you knoware aware, that your subroutines modify the data arrays globally?

    Take this example:

    #! /usr/bin/perl -l + use strict; + use warnings; + + our @array = ( 1 .. 5 ); + + sub foo { + our @array; + + print " inside before modification: @array"; + + # work with @array and modify it + @array = ( 'a' .. 'e' ); + + print " inside after modification: @array"; + } + + # now do the work + print "outside before calling foo(): @array"; + foo(); + print "outside after calling foo(): @array";

    This results in:

    outside before calling foo(): 1 2 3 4 5 inside before modification: 1 2 3 4 5 inside after modification: a b c d e outside after calling foo(): a b c d e

    As one can see, outside the sub, the array is changed as well.

    So afterTransferred to your benchmark script: With the first call of the first subroutine, the data arrays are modified. All following calls use that modified data and might change the data again. This might produce erroneous benchmark results.

    In the current setting, this might be not very severe. But there might be situations, where this is fatal!

    You should use separate arrays inside your sub routines (my @work = @array;), or localize the variables (local @array = @array;) inside the sub routines.

      "I hope you are aware, that your subroutines modify the data arrays globally?"

      Actually yes. Please don't ask why i did it this way ;-)

      Thanks, HNY and best regards, Karl

      «The Crux of the Biscuit is the Apostrophe»

      Thanks for pointing this out. I knew I should have tested it since I rarely use 'our' and didn't really follow its use here. I just updated the code and redid the results of the tests.

Re^2: Removing elemets from an array
by Lotus1 (Vicar) on Dec 29, 2012 at 17:59 UTC

    Update: Thanks to linuxer for pointing out that the original data were modified by the functions and therefore after the first call the other functions had a smaller data set. Here is the corrected version and the results.

    Updated again: realized I had warnings turned on and the nithins code warnings slowed down the benchmark a bunch when they printed to STDERR. Fixed the nithins approach.

    As I've seen BrowserUK do before I replaced the toy data set with something more realistic and the nithins approach became the slowest.

    I started with 10^6 and then 10^5 elements in the range but got warnings about too few iterations to make a comparison.

    #!/usr/bin/perl + use Benchmark qw ( :hireswallclock cmpthese timethese ); use strict; # use warnings; use Set::Scalar; use Data::Dumper; use List::Compare; # our @arcb = ( 450, 625, 720, 645 ); # our @arca = ( 625, 645 ); our @arcb = grep {$_ % 2} (1..10000); our @arca = grep {not $_ % 5} (1..10000); sub davido { #our @arcb; #our @arca; local @arcb = @arcb; local @arca = @arca; @arcb = do { my %exclude; @exclude{@arca} = (); grep { !exists $exclude{$_} } @arcb; }; # print "@arcb\n"; } sub karlgoethebier { #our @arcb; #our @arca; local @arcb = @arcb; local @arca = @arca; my $s1 = Set::Scalar->new(@arcb); my $s2 = Set::Scalar->new(@arca); @arcb = @{ $s1 - $s2 }; # print Dumper( \@arcb ); } sub Lotus1 { #our @arcb; #our @arca; local @arcb = @arcb; local @arca = @arca; my $lc = List::Compare->new( \@arca, \@arcb ); @arcb = $lc->get_Ronly; # print "arcb: @arcb\n"; } sub LanX { my ( %arcb, %arca ); #our @arcb; #our @arca; local @arcb = @arcb; local @arca = @arca; @arcb{@arcb} = (); # @arca{@arca} = (); update here delete @arcb{@arca}; @arcb = keys %arcb; # print Dumper( \@arcb ); } sub linuxer { #our @arcb; #our @arca; local @arcb = @arcb; local @arca = @arca; my %unwanted; $unwanted{$_}++ for @arca; @arcb = grep { !$unwanted{$_} } @arcb; # print "@arcb\n"; } sub nithins { #our @arcb; #our @arca; local @arcb = @arcb; local @arca = @arca; foreach my $a (@arca) { for ( my $i = 0 ; $i < scalar(@arcb) ; $i++ ) { #delete $arcb[$i] if ( $a == $arcb[$i] ); #fixed the warnings by using splice splice @arcb,$i,1 if ( $a == $arcb[$i] ); } } # print "@arcb"; } # karlgoethebier; # Lotus1; # LanX; # davido; # linuxer; # nithins; my $results = timethese( -20, { 'karlgoethebier' => 'karlgoethebier', 'Lotus1' => 'Lotus1', 'LanX' => 'LanX', 'davido' => 'davido', 'linuxer' => 'linuxer', 'nithins' => 'nithins', } ); cmpthese($results); __END__ Benchmark: running LanX, Lotus1, davido, karlgoethebier, linuxer, nith +ins for at least 20 CPU seconds... LanX: 21.5303 wallclock secs (21.14 usr + 0.38 sys = 21.52 CPU) + @ 143.25/s (n=3082) Lotus1: 20.4152 wallclock secs (20.27 usr + 0.14 sys = 20.41 CPU) + @ 24.55/s (n=501) davido: 20.1223 wallclock secs (19.72 usr + 0.38 sys = 20.09 CPU) + @ 110.78/s (n=2226) karlgoethebier: 22.1035 wallclock secs (22.06 usr + 0.03 sys = 22.09 +CPU) @ 22.22/s (n=491) linuxer: 21.5713 wallclock secs (21.14 usr + 0.42 sys = 21.56 CPU) + @ 147.39/s (n=3178) nithins: 21.1337 wallclock secs (21.09 usr + 0.00 sys = 21.09 CPU) + @ 0.66/s (n=14) Rate nithins karlgoethebier Lotus1 davido La +nX linuxer nithins 0.664/s -- -97% -97% -99% -10 +0% -100% karlgoethebier 22.2/s 3248% -- -9% -80% -8 +4% -85% Lotus1 24.6/s 3599% 10% -- -78% -8 +3% -83% davido 111/s 16590% 398% 351% -- -2 +3% -25% LanX 143/s 21482% 545% 483% 29% +-- -3% linuxer 147/s 22106% 563% 500% 33% +3% -- Note: the results below are the original version with the modified dat +a sets from the commented out 'our' array declarations. Benchmark: running LanX, Lotus1, davido, karlgoethebier, linuxer, nith +ins for at least 20 CPU seconds... LanX: 24.0026 wallclock secs (22.73 usr + 0.00 sys = 22.73 CPU) + @ 608.25/s (n=13828) Lotus1: 22.2431 wallclock secs (21.50 usr + 0.30 sys = 21.80 CPU) + @ 27.53/s (n=600) davido: 20.17 wallclock secs (19.81 usr + 0.33 sys = 20.14 CPU) @ + 199.25/s (n=4013) karlgoethebier: 20.2918 wallclock secs (19.95 usr + 0.33 sys = 20.28 +CPU) @ 14.35/s (n=291) linuxer: 20.1558 wallclock secs (19.87 usr + 0.26 sys = 20.14 CPU) + @ 321.91/s (n=6483) nithins: 20.9194 wallclock secs (20.89 usr + 0.00 sys = 20.89 CPU) + @ 0.77/s (n=16) Rate nithins karlgoethebier Lotus1 davido linuxe +r LanX nithins 0.766/s -- -95% -97% -100% -100 +% -100% karlgoethebier 14.3/s 1773% -- -48% -93% -96 +% -98% Lotus1 27.5/s 3494% 92% -- -86% -91 +% -95% davido 199/s 25915% 1289% 624% -- -38 +% -67% linuxer 322/s 41932% 2144% 1069% 62% - +- -47% LanX 608/s 79319% 4139% 2110% 205% 89 +% --

    updated with LanX's update

      This is very interesting. But for the moment i don't have any idea why it is so.

      Perhaps some of the involved brothers would like to comment/interprete the result so we can learn something new?

      Best regards, Karl

      «The Crux of the Biscuit is the Apostrophe»

        Every operation in Perl - like any abstraction - has an overhead which can't be compensated by its C-implementation.

        So tending to use as few as possible perl-commands which do mass-operations means to reduce overhead and delaying the task to highly optimized C.

        Loops (including maps) are just multiplying the amount of executed commands (just imagine the linearized alternative which is even faster as the loop...)

        so my approach is the fastest because its basically reduced to only 3 perl commandsą

        1. setting a hash

        2. deleting a slice from that hash

        3. reading the resulting hash

        OTOH my approach has drawbacks, depending on the task, it's only suitable for real sets of strings.

        Arrays can contain repeated data or other datatypes like refs.

        EDIT: you might be interested in Using hashes for set operations...

        Cheers Rolf

        PS: of course there are still loops working under the hood, but they are already optimized in C.

        Thanks for setting up the benchmark in the first place, it has been fun seeing the performance of the different approaches.

        Nested loops don't scale well. When there were only 2 items for the outer loop it was really fast. I've seen folks here toss around the Big O notation and although I hadn't heard of it before Perlmonks I figured out they were talking about linear growth as opposed to exponential or some other function for the execution time related to the size of the data set.

        The nested loop approach seems to be O(n^2) but I'm still learning so hopefully someone will weigh in if I'm off base.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1010720]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others admiring the Monastery: (4)
As of 2024-03-29 05:41 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found