Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?
 
PerlMonks  

(tye)Re: Algorithm::Diff icult to use

by tye (Sage)
on Aug 22, 2002 at 07:42 UTC ( [id://191950]=note: print w/replies, xml ) Need Help??


in reply to Re: Algorithm::Diff icult to use
in thread Algorithm::Diff icult to use

Thanks again to everyone for the feedback.

First, the easy one. I'm not adding a prototype. The existing routines don't have prototypes and I only use prototypes in very specific situations and this isn't one of those.

While a LoL isn't very deeply nested, it does open up the possiblity for creating a very large number of very small anonymous arrays. Each array adds a chunk of overhead. For smalls arrays, this overhead becomes a significant percentage. So I like splitting up the return value some but would make it optional.

As for adding more fields for convenience, I'm reluctant for two reasons. The first is the same as above; it reduces the size of problem you can address with the same resources. The second is that it makes it harder to remember which index gives you back which bit of information.

In fact, this has made me realize that much of what I was returning is redundant. I could reduce it from 5 down to just 2 like so:

# @a @b @a @b # same? range range values values ( 0, 0=>0, 0=>-1, # a ! 1, 1=>2, 0=> 1, # b c = b c 0, 3=>2, 2=> 2, # ! d 1, 3=>3, 3=> 3, # e = e 0, 4=>4, 4=> 4, # f ! h 1, 5=>5, 5=> 5, # j = j 0, 6=>5, 6=> 6, # ! k 1, 6=>7, 7=> 8, # l m = l m 0, 8=>9, 9=>11, # n p ! r s t ); # @a @b @a @b # start start values values ( 0, 0, # = 0, 0, # a ! 1, 0, # b c = b c 3, 2, # ! d 3, 3, # e = e 4, 4, # f ! h 5, 5, # j = j 6, 6, # ! k 6, 7, # l m = l m 8, 9, # n p ! r s t 10, 12, # );

by just making the loop logic a bit more complex like so:

# old loop code ######################################## my $diff= easy( \@a, \@b ); while( @$diff ) { my( $same, $aMin, $aMax, $bMin, $bMax )= splice @$diff, 0, 5; # ... } # new loop code ######################################## my $diff= easy( \@a, \@b ); my $same= 0; while( @$diff ) { my( $aMin, $bMin, $aMax, $bMax )= @$diff; --$aMax; --$bMax; $same= 1 - $same; splice @$diff, 0, 2; # ... }

which gives me a neat idea!

I'll store the data very efficiently (a single list with 2 entries per 'hunk') but provide routines for traversing the list in a convenient manner (which also solves my 'naming' problem).

my $diff= Algorithm::Diff->new( \@a, \@b ); $diff->base( 1 ); # Return line numbers, not indices while( $diff->next() ) { next if $diff->same(); my $sep= ''; if( ! $diff->bRange() ) { sprintf "%d,%dd%d\n", $diff->get(qw( aMin aMax bMax )); } elsif( ! $diff->aRange() ) { sprint "%da%d,%d\n", $diff->get(qw( aMax bMin bMax )); } else { $sep= "---\n"; sprintf "%d,%dc%d%d\n", $diff->get(qw( aMin aMax bMin bMax )); } print "< $_" for $diff->aRange(); print $sep; print "> $_" for $diff->bRange(); }

Note how $diff->aRange() returns the lines falling into that range but, in a scalar context, returns the number of lines in that range (the $aLen you wanted) which also happens to be "false" only if there are no lines in that range (like the bits of $diff you asked for).

Other methods would include

  • $diff->prev() to move along the chunks in the other direction
  • $diff->end() to reset the iterator prematurely so subsequently next() would go to the top while prev() would got to the bottom.
  • $diff->aMin() would be the same as $diff->get("aMin"), etc.
Takes about 60% less RAM but is much more flexible. Thanks!

Note, in case you plan to complain about me optimizing for RAM size by resorting to OO methods, which run a bit slower than regular subroutines (or direct manipulation of a data structure). It is a good trade-off, IMO, because if your problem requires 30% more RAM than what you have available, then either it just fails or it runs really, really slowly. While, it running 10% slower is a minor problem in comparison.

        - tye (but my friends call me "Tye")

Replies are listed 'Best First'.
Re: (tye)Re: Algorithm::Diff icult to use
by bart (Canon) on Aug 22, 2002 at 09:35 UTC
    # old loop code ######################################## my $diff= easy( \@a, \@b ); while( @$diff ) { my( $same, $aMin, $aMax, $bMin, $bMax )= splice @$diff, 0, 5; # ... } # new loop code ######################################## my $diff= easy( \@a, \@b ); my $same= 0; while( @$diff ) { my( $aMin, $bMin, $aMax, $bMax )= @$diff; # ... splice @$diff, 0, 2; # ... }

    Ouch! Precisely what I wanted to avoid: a modification in the number of items. Well, you could export a constant for this number, whether it be 2 or 5.

    BTW if memory efficiency is what you're worried about, why don't you just use 1 scalar for each item? After all, these are all just integers, so messing with pack()/unpack() or vec() would allow you to combine them, in one scalar. Only an 8 (or 20) byte string per item! The joy! The savings! (The pain!)

Re^3: Algorithm::Diff icult to use
by Aristotle (Chancellor) on Aug 22, 2002 at 10:12 UTC
    You are right that the changes in the structure would have cost more overhead, but I was proposing them for the convenience and readability they'd provide. I was going to interject some points in that vein as I was reading; until you presented the OO interface. That's perfect as far as I'm concerned.

    Makeshifts last the longest.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://191950]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others chilling in the Monastery: (3)
As of 2024-04-16 19:05 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found