RFC: List::Extract

Description

I have a module I currently call List::Extract that has a single subroutine: extract. It's used like

my @extracted = extract { ... } @list;
[download]

It removes and returns the elements that tests true for the code in the block, like grep and splice combined. The code above is equivalent to

my @extracted;
my $c = 0;
while ($c < @$list) {
    local *_ = \$list->[$c];
    if (do { ... }) {
        push @extracted, splice @$list, $c, 1;
    }
    else {
        $c++;
    }
}
[download]

Motivation

The reason I wrote this routine was that I haven't found any idiom to achieve the same thing that I'm pleased with. The only other way that I'm lazy enough to write that's significally shorter and logically simpler than the code above is

my @extracted = grep { ... } @list;
@list = grep { not ... } @list;
[download]

but I don't like that, mostly because it duplicates the logic in the grep block, and factoring it out would make it unpretty again:

my $filter = sub { ... };
my @extracted = grep { $filter->() } @list;
@list = grep { not $filter->() } @list;
[download]

and still I iterate the array twice.

My questions

Is there any other module that does the same thing? (I haven't found any on CPAN.)
Do you have any suggestion for a better name?
Do you think that modifications to $_ should be discarded for elements in the returned list?
Do you think that modifications to $_ should be discarded for elements kept in the array?

lodin

Update: Uploaded as List::Extract to CPAN.

Comment on RFC: List::Extract Select or Download Code

Replies are listed 'Best First'.
Re: RFC: List::Extract by johngg (Canon) on Nov 24, 2007 at 21:48 UTC
By using reverse twice combined with grep and map you can splice out the elements you want in one fell swoop. `use strict; use warnings; my @array = ( 3, 13, 73, 4, 29, 38 ); print qq{Before:\n}, qq{ @array\n}; my $rcExtract = sub { my $toTest = shift; return $toTest > 31 ? 1 : 0; }; my @extracted = reverse map { splice @array, $_, 1 } grep { $rcExtract->( $array[ $_ ] ) } reverse 0 .. $#array; print qq{After:\n}, qq{ @array\n}, qq{ @extracted\n};` [download] The output. `Before: 3 13 73 4 29 38 After: 3 13 4 29 73 38` [download] I hope this is of interest. Cheers, JohnGG	[reply] [d/l] [select]
Re^2: RFC: List::Extract by lodin (Hermit) on Nov 24, 2007 at 23:09 UTC
Nice. This instantly reminded me about this node of mine, where I did almost the same thing. But I didn't think too much of it then. Thanks for reminding me. I think it's just, not far but enough, across the line where it's worth putting in a module though. lodin	[reply]
Re^3: RFC: List::Extract by johngg (Canon) on Nov 24, 2007 at 23:43 UTC
I hadn't seen your node as that thread happened while I was away on holiday watching madmen race motorcycles on the Isle of Man :-) I find `reverse` useful in a lot of situations and have advocated it's use before. What I particularly like is the way you can apply it in both list and scalar contexts, the latter to reverse a string. I use both in this post. Perl is just full of amazing tools. I love it. Cheers, JohnGG	[reply] [d/l]
Re^2: RFC: List::Extract by kyle (Abbot) on Nov 25, 2007 at 03:23 UTC
I like it! I wonder, though, what the performance hit is of looping over the list twice vs. once. Without doing any benchmarking, I suspect that it depends on the size of the list and how many elements are removed/kept. Just a thought.	[reply]
Re^3: RFC: List::Extract by johngg (Canon) on Nov 27, 2007 at 13:48 UTC
I have put together a benchmark testing lodin's code and the double-`grep` plus merlyn's and my code against different sized arrays and sifting out differing amounts of data. I couldn't get to grips with the subroutine prototype so I took that out of merlyn's routine. Here is the code. Read more... (3 kB) Here's the results. Read more... (7 kB) It seems to show that my code has a bit of an advantage when the arrays are not too large and when less data is being sifted out. It is noticeable that merlyn's code isn't impacted the more data that is sifted whereas mine is. I guess that is because he is always building two arrays whereas my `map / splice` has to do more work the more data is sifted. Also note the sudden elevation of `twoGreps` with large arrays for the 30%, 50% and 70% sifts but not for 10% or 90%. I've cocked up benchmarks before, however, so take all this with a pinch of salt. I won't be surprised if somebody tells me I've done the same this time. I hope this is of interest. Cheers, JohnGG	[reply] [d/l] [select]
Re: RFC: List::Extract by eserte (Deacon) on Nov 24, 2007 at 21:11 UTC
I think this function should go to List::MoreUtils (no, it's not yet there).	[reply]
Re^2: RFC: List::Extract by doom (Deacon) on Nov 25, 2007 at 00:22 UTC
Overall, I like the idea of "extract", and I don't know of any equivalent (it sounds vaguely like some sort of set operation, but there's nothing like it in Set::Scalar, for example). It would indeed be a good addition to List::MoreUtils, if the maintainers of it are amendable -- but I can understand why you might want to just go ahead and package it up on CPAN and skip the negotiation process. I like the name "extract", myself (an alternate name might be "move"). I think the naming implies that any modifications to $_ should be discarded (but maybe that's reasonable in an un-perl like way, eh?). I would suggest writing variant versions that preserve side-effects but use different names for them, possibly "map_extract" or something like that.	[reply]
Re^3: RFC: List::Extract by lodin (Hermit) on Nov 25, 2007 at 01:37 UTC
Thanks for you feedback. About keeping or discarding changes. Currently I'm leaning towards keeping changes in the returned list, but discarding changes for the other elements. Here's an example that motivates keeping the changes for the returned list: `my @keywords = qw/ foo !bar baz /; my @exclude = extract { s/^!// } @keywords;` [download] If changes are discarded you'd have to do ugly workarounds to not duplicate the logic. But if the changes are kept and you don't want them it's trivial to change the behaviour: just add `local $_ = $_;`. The reason I want to discard changes for elements left in the array is that if changes is kept for the returned array then changes are kept for all elements, and then you can usually just as well do those changes to the array before `extract`. lodin	[reply] [d/l] [select]
Re^4: RFC: List::Extract by mreece (Friar) on Nov 25, 2007 at 03:52 UTC
Re^5: RFC: List::Extract by doom (Deacon) on Nov 25, 2007 at 22:13 UTC
Re^3: RFC: List::Extract by doom (Deacon) on Nov 25, 2007 at 22:43 UTC
By the way, I looked around on CPAN again for something like "extract". The closest I've found is List::Part, which takes a list and subdivides into two according to a given critereon. `my ($success, $failure) = part { /^ERR/ } <LOG>;` [download] It follows the "grep" convention: modifications to $_ effect both the source and the results, as I verified with a test script: `use List::Part; my @candidates = qw( !britney !paris bettie patti ); my ($hot, $not) = part { s{^!}{} } @candidates; print "hot: ", Dumper( $hot ), "\n"; print "not: ", Dumper( $not ), "\n"; print "stuff: ", Dumper( \@candidates ), "\n";` [download]	[reply] [d/l] [select]
Re: RFC: List::Extract by merlyn (Sage) on Nov 25, 2007 at 14:16 UTC
Untested, but I have a knack at this... :) `sub extract (&\@) { my ($code, $aref) = @_; my @return; my @replace; for (@$aref) { # aliases $_ to the item in the original list if (&$code) { push @return, $_; } else { push @replace, $_; } } @$aref = @replace; return @return; }` [download] To answer your last questions, it keeps modifications both to the input and output arrays. -- Randal L. Schwartz, Perl hacker	[reply] [d/l]
Re: RFC: List::Extract by Fletch (Bishop) on Nov 25, 2007 at 14:53 UTC
An idea as to naming inspiration: Ruby has similar methods on Array called `select!` and `reject!` which are in place versions of `select` and `reject`. The latter pair work similar to Perl's `grep` (returning elements for which the block provided returns true (or for which it doesn't return true for `reject`)), whereas the bang-suffixed versions follow Ruby convention and diddle their argument in place to remove the affected elements. (And for the curious, Ruby does have a `a.grep( pattern )` method on the array but it works using the `===` comparison operator to check `pattern === element` rather than using a predicate block; if it's given a block it behaves like a map on the matching items. There is no `grep!` though, which is kinda strange seeing that the other filtering methods have mutators.) Update: Oh wait, you're returning the removed elements as well. `select!` and `reject!` just return the newly changed array, or `nil` if no modifications were made. Duuur. Me not awake yet. Never mind, this is actually more like `partition` (which takes a block and returns two arrays, one for elements for which the block returned true and the other for elements for which it returned false), but there's not a mutator variant of that. We're looking for a Perl and Database Developer for Corporate Investments Group.	[reply] [d/l] [select]