Re: CPAN Module to determing overlap of 2 lists?

#!/usr/bin/perl

use strict; # https://perlmonks.org/?node_id=11120564
use warnings;

my $file1 = <<END;
one
two
three
four
five
six
END

my $file2 = <<END;
two
three
four
five
six
seven
END

my $marker = '***MARKER***'; # something not in either string

my $combine = "$file1$marker$file2" =~ s/(.*)\K\Q$marker\E\1//sr;

print $combine;
[download]

Outputs:

one
two
three
four
five
six
seven
[download]

Comment on Re: CPAN Module to determing overlap of 2 lists? Select or Download Code

Replies are listed 'Best First'.
Re^2: CPAN Module to determing overlap of 2 lists? by wazat (Monk) on Aug 11, 2020 at 19:03 UTC
I hadn't thought much about a regex solution. To ensure the match start is a complete line, requires a small tweak. `my $combine = "$file1$marker$file2" =~ s/(?:\A\|\n)(.*)\K\Q$marker\E\1/ +/sr;` [download]	[reply] [d/l]
Re^3: CPAN Module to determing overlap of 2 lists? by LanX (Saint) on Aug 12, 2020 at 18:27 UTC
3 suggestions you don't need complete lines to make it work, but anchoring to line start might prove to be faster I'd include characters below ASCII 8 to the "marker" to play safe, see also discussion surrounding the similar `$;` you might be interested to check with `re` "debug" , how the backtracking of the `.*` submatch performs. I'd guess you prefer it to grow from right to left instead of shrinking from left to right. I know the regex engine can do this depending on the anchors. I haven't checked the last point since performance might not be your biggest issue. Cheers Rolf _{(addicted to the Perl Programming Language :) Wikisyntax for the Monastery}	[reply] [d/l]
Re^4: CPAN Module to determing overlap of 2 lists? (updated) by LanX (Saint) on Aug 12, 2020 at 21:08 UTC
> grow from right to left instead of shrinking from left to right This might be much faster if the overlaps are considerably smaller than the total files. ~~And it avoids any semipredicate problem with $marker.~~° (Not heavily tested, please check edge-cases) `use strict; use warnings; my $file1 = join "\n", qw( a b c d c ); my $file2 = join "\n", qw( c d c x ); my $content = "$file2\n$file1"; $content =~ /^(.)\n.\1$/s; (substr $file2,0,length $1)=$file1; print $file2;` [download] `a b c d c x` [download] Cheers Rolf _{(addicted to the Perl Programming Language :) Wikisyntax for the Monastery} °) unfortunately it doesn't, prove left to the interested reader	[reply] [d/l] [select]
Re^4: CPAN Module to determing overlap of 2 lists? by wazat (Monk) on Aug 13, 2020 at 00:50 UTC
I added the line start anchor as I wanted to match whole lines. Agreed, assuming text files, a more "binary" marker is better. Currently I feel the regex solution is interesting, but still not my first choice. I'll dig deeper if I start profiling.	[reply]
Re^5: CPAN Module to determing overlap of 2 lists? by LanX (Saint) on Aug 13, 2020 at 14:15 UTC


The stupid question is the question not asked
	PerlMonks