Beefy Boxes and Bandwidth Generously Provided by pair Networks
P is for Practical
 
PerlMonks  

Re: List::Compare

by dragonchild (Archbishop)
on Mar 25, 2004 at 17:49 UTC ( [id://339814]=note: print w/replies, xml ) Need Help??


in reply to List::Compare

I'd like to point out a few short-comings with List::Compare:
  1. It stringifies almost everything. Specifically, it does not stringify get_bag(), but it does everything else. This means it will have serious problems working with references.
  2. It doesn't maintain order. This may not be important in many situations, but a "list" is inherently ordered. Set::Object has similar methods, and actually deals with the appropriate term. :-)
  3. It has a bug when dealing with lists of code-references without using the -u (unsorted) flag. Specifically, if the first element in your first list is a code-reference, sort will attempt to use it as the sorting method. Which means it's not in the list of things to sort, so it's lost from the bag.

I don't meant to indicate that it's a bad module. Perl lists are ... difficult to deal with. (Oh - it also doesn't deal with lists ... it deals with arrays. But, that's another nit.)

------
We are the carpenters and bricklayers of the Information Age.

Then there are Damian modules.... *sigh* ... that's not about being less-lazy -- that's about being on some really good drugs -- you know, there is no spoon. - flyingmoose

Replies are listed 'Best First'.
Re: Re: List::Compare
by McMahon (Chaplain) on Mar 25, 2004 at 17:56 UTC
    Thanks for the warnings!

    But the problem this solves for me is to compare gigantic files full of the output of File::Find. All strings, no references, order immaterial. Under those circumstances, it rocks bells. =)
      What it sounds like you're doing is slurping all the files into memory, then dealing with them using List::Compare. I don't know what beast of a machine you're using, but I doubt most machines can do that without serious thrashing.

      Much better would be to use the unix sort command. This is one of the exact reasons it was designed. It is written in very optimized C, and as such, will always beat out Perl.

      ------
      We are the carpenters and bricklayers of the Information Age.

      Then there are Damian modules.... *sigh* ... that's not about being less-lazy -- that's about being on some really good drugs -- you know, there is no spoon. - flyingmoose

        It's surprisingly mellow thrash-wise, actually. I just do
        #SETUP DELETED, CODE WON'T RUN my @file1 = <FILE1>; my @file2 = <FILE2>; my $lc = List::Compare->new(\@file1, \@file2); my @file1only = $lc->get_Lonly; my @file2only = $lc->get_Ronly; print OUT "Files that exist only in FILE1:\n"; print OUT "IGNORING FILES WITH Tmp OR Temp IN PATHNAME!!!\n\n"; foreach my $file1(@file1only) { unless (($file1 =~ "Tmp") or ($file1=~"Temp")) { print OUT $file1; } } print OUT "\n\n"; print OUT "Files that exist only in FILE2:\n"; print OUT "IGNORING FILES WITH Tmp OR Temp IN PATHNAME!!!\n\n"; foreach my $file2(@file2only) { unless (($file2 =~ "Tmp") or ($file2=~"Temp")) { print OUT $file2; } }
        and it takes less than a minute to run on my box here at work.

        Unfortunately, we're an all-Windows shop. I've managed to infiltrate a couple of FreeBSD boxes into the test department (cool network tools and a neato disk imaging system called Frisbee), but it's hardly worth moving all of the files over there and back just to save a few seconds using "sort" instead of List::Compare.
Re^2: List::Compare
by jkeenan1 (Deacon) on Jun 04, 2004 at 14:13 UTC
    Until I recently saw a reference to this discussion thread in a CPAN review of List::Compare, I was unaware that my module was being discussed in the Monastery. So today I'd like to share some thoughts on the issues raised. As these issues were raised by various monks, I'll reply to the individual threads today but try to integrate these comments in the talk I am giving on List::Compare at YAPC::NA in Buffalo two weeks from today.

    Re: Dragonchild on List::Compare not working with references:

    The real-world production problems for which I originally developed List::Compare did not include lists of Perl references. The module was not designed to handle them and has never been tested against them.

    If you can suggest a way for the module to detect when a list passed to the constructor (or a function in List::Compare::Functional) contains a reference, I will be happy to generate a 'die' at that point. Otherwise, I'll simply include a warning against this in the next revision of the documentation.

    Re: Dragonchild on List::Compare not preserving order

    I don't see any particular reason why a report on the the intersection, union, etc., of two or more lists should come back in any particular order. Whatever order the original lists had is untouched; the order of the set comparisons is irrelevant. As the documentation says at a couple of points, List::Compare was designed to answer the question: Was this item seen in that list? Pure and simple. In what position in the list the item was seen is a question for a different module to answer. Jim Keenan

      The real-world production problems for which I originally developed List::Compare did not include lists of Perl references. The module was not designed to handle them and has never been tested against them.

      That's perfectly fine, but you don't mention that in your documentation. Either you handle them or you make it perfectly clear that you don't handle them. This is a very important point.

      As for figuring out if a list contains a reference ... what's wrong with grep { ref } @list?

      I don't see any particular reason why a report on the the intersection, union, etc., of two or more lists should come back in any particular order.

      If you were dealing with sets, then you would be correct. However, if I'm working with lists, I expect that the ordering property of lists would be maintained in every single action. map and grep maintain order. Your functions, to me, are in the same vein.

      If I want to find out "Is this item in that grouping?", I would consider that a set operation, if I'm using a module. Sets are intrinsically unordered.

      The question also isn't a matter of what position in the list a given item was seen. If you are saying @list3 = intersection(\@list1, \@list2);, I would assume (because it's not otherwise stated in the docs) that @list3 has the elements in the order seen in @list1. Essentially, intersection() would be written as so:

      sub intersection { my ($l1, $l2) = @_; my %l2; undef @l2{@$l2}; my @l3 = grep { exists $l2{$item} } @l1; return wantarray ? @l3 : \@l3; }

      That preserves the order. If I don't care about order, I should be using sets.

      Now, you're wondering what the big deal is - most people wouldn't care. And, you'd be right. Except, some people will and it doesn't cost a lot to make it right.

      ------
      We are the carpenters and bricklayers of the Information Age.

      Then there are Damian modules.... *sigh* ... that's not about being less-lazy -- that's about being on some really good drugs -- you know, there is no spoon. - flyingmoose

      I shouldn't have to say this, but any code, unless otherwise stated, is untested

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://339814]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others pondering the Monastery: (8)
As of 2024-04-25 15:08 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found