(tye)Re: Algorithm::Diff icult to use

Thanks again to everyone for the feedback.

First, the easy one. I'm not adding a prototype. The existing routines don't have prototypes and I only use prototypes in very specific situations and this isn't one of those.

While a LoL isn't very deeply nested, it does open up the possiblity for creating a very large number of very small anonymous arrays. Each array adds a chunk of overhead. For smalls arrays, this overhead becomes a significant percentage. So I like splitting up the return value some but would make it optional.

As for adding more fields for convenience, I'm reluctant for two reasons. The first is the same as above; it reduces the size of problem you can address with the same resources. The second is that it makes it harder to remember which index gives you back which bit of information.

In fact, this has made me realize that much of what I was returning is redundant. I could reduce it from 5 down to just 2 like so:

    #         @a      @b        @a       @b
    # same?  range   range    values   values
    (   0,   0=>0,   0=>-1,  #    a  !
        1,   1=>2,   0=> 1,  #  b c  =  b c
        0,   3=>2,   2=> 2,  #       !  d
        1,   3=>3,   3=> 3,  #    e  =  e
        0,   4=>4,   4=> 4,  #    f  !  h
        1,   5=>5,   5=> 5,  #    j  =  j
        0,   6=>5,   6=> 6,  #       !  k
        1,   6=>7,   7=> 8,  #  l m  =  l m
        0,   8=>9,   9=>11,  #  n p  !  r s t
    );
    #         @a      @b        @a       @b
    #        start   start    values   values
    (        0,      0,      #       =
             0,      0,      #    a  !
             1,      0,      #  b c  =  b c
             3,      2,      #       !  d
             3,      3,      #    e  =  e
             4,      4,      #    f  !  h
             5,      5,      #    j  =  j
             6,      6,      #       !  k
             6,      7,      #  l m  =  l m
             8,      9,      #  n p  !  r s t
            10,     12,      #
    );
[download]

by just making the loop logic a bit more complex like so:

    # old loop code
    ########################################
    my $diff= easy( \@a, \@b );
    while(  @$diff  ) {
        my( $same, $aMin, $aMax, $bMin, $bMax )=
            splice @$diff, 0, 5;
        # ...
    }

    # new loop code
    ########################################
    my $diff= easy( \@a, \@b );
    my $same= 0;
    while(  @$diff  ) {
        my( $aMin, $bMin, $aMax, $bMax )= @$diff;
        --$aMax; --$bMax;
        $same= 1 - $same;
        splice @$diff, 0, 2;
        # ...
    }
[download]

which gives me a neat idea!

I'll store the data very efficiently (a single list with 2 entries per 'hunk') but provide routines for traversing the list in a convenient manner (which also solves my 'naming' problem).

    my $diff= Algorithm::Diff->new( \@a, \@b );
    $diff->base( 1 );   # Return line numbers, not indices
    while(  $diff->next()  ) {
        next   if  $diff->same();
        my $sep= '';
        if(  ! $diff->bRange()  ) {
            sprintf "%d,%dd%d\n",
                $diff->get(qw( aMin aMax bMax ));
        } elsif(  ! $diff->aRange()  ) {
            sprint "%da%d,%d\n",
                $diff->get(qw( aMax bMin bMax ));
        } else {
            $sep= "---\n";
            sprintf "%d,%dc%d%d\n",
                $diff->get(qw( aMin aMax bMin bMax ));
        }
        print "< $_"   for  $diff->aRange();
        print $sep;
        print "> $_"   for  $diff->bRange();
    }
[download]

Note how $diff->aRange() returns the lines falling into that range but, in a scalar context, returns the number of lines in that range (the $aLen you wanted) which also happens to be "false" only if there are no lines in that range (like the bits of $diff you asked for).

Other methods would include

$diff->prev() to move along the chunks in the other direction
$diff->end() to reset the iterator prematurely so subsequently next() would go to the top while prev() would got to the bottom.
$diff->aMin() would be the same as $diff->get("aMin"), etc.

Takes about 60% less RAM but is much more flexible. Thanks!

Note, in case you plan to complain about me optimizing for RAM size by resorting to OO methods, which run a bit slower than regular subroutines (or direct manipulation of a data structure). It is a good trade-off, IMO, because if your problem requires 30% more RAM than what you have available, then either it just fails or it runs really, really slowly. While, it running 10% slower is a minor problem in comparison.

- tye (but my friends call me "Tye")

Comment on (tye)Re: Algorithm::Diff icult to use Select or Download Code

Replies are listed 'Best First'.
Re: (tye)Re: Algorithm::Diff icult to use by bart (Canon) on Aug 22, 2002 at 09:35 UTC
`# old loop code ######################################## my $diff= easy( \@a, \@b ); while( @$diff ) { my( $same, $aMin, $aMax, $bMin, $bMax )= splice @$diff, 0, 5; # ... } # new loop code ######################################## my $diff= easy( \@a, \@b ); my $same= 0; while( @$diff ) { my( $aMin, $bMin, $aMax, $bMax )= @$diff; # ... splice @$diff, 0, 2; # ... }` [download] Ouch! Precisely what I wanted to avoid: a modification in the number of items. Well, you could export a constant for this number, whether it be 2 or 5. BTW if memory efficiency is what you're worried about, why don't you just use 1 scalar for each item? After all, these are all just integers, so messing with pack()/unpack() or vec() would allow you to combine them, in one scalar. Only an 8 (or 20) byte string per item! The joy! The savings! (The pain!)	[reply] [d/l]
Re^3: Algorithm::Diff icult to use by Aristotle (Chancellor) on Aug 22, 2002 at 10:12 UTC
You are right that the changes in the structure would have cost more overhead, but I was proposing them for the convenience and readability they'd provide. I was going to interject some points in that vein as I was reading; until you presented the OO interface. That's perfect as far as I'm concerned. Makeshifts last the longest.	[reply]


Problems? Is your data what you think it is?
	PerlMonks