Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?
 
PerlMonks  

RFC: List::Extract

by lodin (Hermit)
on Nov 24, 2007 at 20:17 UTC ( [id://652756]=perlmeditation: print w/replies, xml ) Need Help??

Description

I have a module I currently call List::Extract that has a single subroutine: extract. It's used like

my @extracted = extract { ... } @list;
It removes and returns the elements that tests true for the code in the block, like grep and splice combined. The code above is equivalent to
my @extracted; my $c = 0; while ($c < @$list) { local *_ = \$list->[$c]; if (do { ... }) { push @extracted, splice @$list, $c, 1; } else { $c++; } }

Motivation

The reason I wrote this routine was that I haven't found any idiom to achieve the same thing that I'm pleased with. The only other way that I'm lazy enough to write that's significally shorter and logically simpler than the code above is

my @extracted = grep { ... } @list; @list = grep { not ... } @list;
but I don't like that, mostly because it duplicates the logic in the grep block, and factoring it out would make it unpretty again:
my $filter = sub { ... }; my @extracted = grep { $filter->() } @list; @list = grep { not $filter->() } @list;
and still I iterate the array twice.

My questions

  • Is there any other module that does the same thing? (I haven't found any on CPAN.)
  • Do you have any suggestion for a better name?
  • Do you think that modifications to $_ should be discarded for elements in the returned list?
  • Do you think that modifications to $_ should be discarded for elements kept in the array?

lodin

Update: Uploaded as List::Extract to CPAN.

Replies are listed 'Best First'.
Re: RFC: List::Extract
by johngg (Canon) on Nov 24, 2007 at 21:48 UTC
    By using reverse twice combined with grep and map you can splice out the elements you want in one fell swoop.

    use strict; use warnings; my @array = ( 3, 13, 73, 4, 29, 38 ); print qq{Before:\n}, qq{ @array\n}; my $rcExtract = sub { my $toTest = shift; return $toTest > 31 ? 1 : 0; }; my @extracted = reverse map { splice @array, $_, 1 } grep { $rcExtract->( $array[ $_ ] ) } reverse 0 .. $#array; print qq{After:\n}, qq{ @array\n}, qq{ @extracted\n};

    The output.

    Before: 3 13 73 4 29 38 After: 3 13 4 29 73 38

    I hope this is of interest.

    Cheers,

    JohnGG

      Nice. This instantly reminded me about this node of mine, where I did almost the same thing. But I didn't think too much of it then. Thanks for reminding me.

      I think it's just, not far but enough, across the line where it's worth putting in a module though.

      lodin

        I hadn't seen your node as that thread happened while I was away on holiday watching madmen race motorcycles on the Isle of Man :-)

        I find reverse useful in a lot of situations and have advocated it's use before. What I particularly like is the way you can apply it in both list and scalar contexts, the latter to reverse a string. I use both in this post.

        Perl is just full of amazing tools. I love it.

        Cheers,

        JohnGG

      I like it!

      I wonder, though, what the performance hit is of looping over the list twice vs. once. Without doing any benchmarking, I suspect that it depends on the size of the list and how many elements are removed/kept.

      Just a thought.

        I have put together a benchmark testing lodin's code and the double-grep plus merlyn's and my code against different sized arrays and sifting out differing amounts of data. I couldn't get to grips with the subroutine prototype so I took that out of merlyn's routine. Here is the code.

        Here's the results.

        It seems to show that my code has a bit of an advantage when the arrays are not too large and when less data is being sifted out. It is noticeable that merlyn's code isn't impacted the more data that is sifted whereas mine is. I guess that is because he is always building two arrays whereas my map / splice has to do more work the more data is sifted. Also note the sudden elevation of twoGreps with large arrays for the 30%, 50% and 70% sifts but not for 10% or 90%.

        I've cocked up benchmarks before, however, so take all this with a pinch of salt. I won't be surprised if somebody tells me I've done the same this time.

        I hope this is of interest.

        Cheers,

        JohnGG

Re: RFC: List::Extract
by eserte (Deacon) on Nov 24, 2007 at 21:11 UTC
    I think this function should go to List::MoreUtils (no, it's not yet there).
      Overall, I like the idea of "extract", and I don't know of any equivalent (it sounds vaguely like some sort of set operation, but there's nothing like it in Set::Scalar, for example). It would indeed be a good addition to List::MoreUtils, if the maintainers of it are amendable -- but I can understand why you might want to just go ahead and package it up on CPAN and skip the negotiation process. I like the name "extract", myself (an alternate name might be "move"). I think the naming implies that any modifications to $_ should be discarded (but maybe that's reasonable in an un-perl like way, eh?). I would suggest writing variant versions that preserve side-effects but use different names for them, possibly "map_extract" or something like that.

        Thanks for you feedback.

        About keeping or discarding changes. Currently I'm leaning towards keeping changes in the returned list, but discarding changes for the other elements.

        Here's an example that motivates keeping the changes for the returned list:

        my @keywords = qw/ foo !bar baz /; my @exclude = extract { s/^!// } @keywords;
        If changes are discarded you'd have to do ugly workarounds to not duplicate the logic. But if the changes are kept and you don't want them it's trivial to change the behaviour: just add local $_ = $_;.

        The reason I want to discard changes for elements left in the array is that if changes is kept for the returned array then changes are kept for all elements, and then you can usually just as well do those changes to the array before extract.

        lodin

        By the way, I looked around on CPAN again for something like "extract". The closest I've found is List::Part, which takes a list and subdivides into two according to a given critereon.
        my ($success, $failure) = part { /^ERR/ } <LOG>;

        It follows the "grep" convention: modifications to $_ effect both the source and the results, as I verified with a test script:

        use List::Part; my @candidates = qw( !britney !paris bettie patti ); my ($hot, $not) = part { s{^!}{} } @candidates; print "hot: ", Dumper( $hot ), "\n"; print "not: ", Dumper( $not ), "\n"; print "stuff: ", Dumper( \@candidates ), "\n";
Re: RFC: List::Extract
by merlyn (Sage) on Nov 25, 2007 at 14:16 UTC
    Untested, but I have a knack at this... :)
    sub extract (&\@) { my ($code, $aref) = @_; my @return; my @replace; for (@$aref) { # aliases $_ to the item in the original list if (&$code) { push @return, $_; } else { push @replace, $_; } } @$aref = @replace; return @return; }
    To answer your last questions, it keeps modifications both to the input and output arrays.
Re: RFC: List::Extract
by Fletch (Bishop) on Nov 25, 2007 at 14:53 UTC

    An idea as to naming inspiration:

    Ruby has similar methods on Array called select! and reject! which are in place versions of select and reject. The latter pair work similar to Perl's grep (returning elements for which the block provided returns true (or for which it doesn't return true for reject)), whereas the bang-suffixed versions follow Ruby convention and diddle their argument in place to remove the affected elements.

    (And for the curious, Ruby does have a a.grep( pattern ) method on the array but it works using the === comparison operator to check pattern === element rather than using a predicate block; if it's given a block it behaves like a map on the matching items. There is no grep! though, which is kinda strange seeing that the other filtering methods have mutators.)

    Update: Oh wait, you're returning the removed elements as well. select! and reject! just return the newly changed array, or nil if no modifications were made. Duuur. Me not awake yet.

    Never mind, this is actually more like partition (which takes a block and returns two arrays, one for elements for which the block returned true and the other for elements for which it returned false), but there's not a mutator variant of that.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlmeditation [id://652756]
Approved by Arunbear
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others avoiding work at the Monastery: (5)
As of 2024-04-25 11:46 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found