compare initial

dideod.yang has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: compare initial by tybalt89 (Monsignor) on Jul 11, 2018 at 13:26 UTC
`#!/usr/bin/perl # https://perlmonks.org/?node_id=1218306 use strict; use warnings; my $a = 'TTTATTT'; my $b = 'TTTTTTT'; my $c = 'TBTTTTT'; print "compare $a to $b, difference is ", $a =~ s/[$b]+//gr, $b =~ s/[$a]+//gr, "\n"; print "compare $b to $c, difference is ", $b =~ s/[$c]+//gr, $c =~ s/[$b]+//gr, "\n";` [download] Outputs: `compare TTTATTT to TTTTTTT, difference is A compare TTTTTTT to TBTTTTT, difference is B` [download]	[reply] [d/l] [select]
Re^2: compare initial by sundialsvc4 (Abbot) on Jul 11, 2018 at 14:54 UTC
Now that’s slick ... “`++`”
Re: compare initial by haukex (Archbishop) on Jul 11, 2018 at 12:05 UTC
The code you've shown does not compile, and is hard to read. Please take a look at SSCCE, perltidy, and Use strict and warnings. When I compare $a and $b, then I want to catch only "A". Also When I compare $b and $c, then I want to catch "B". But that's not really what your code is doing - it's bunching all the characters from all the strings together into one hash, and then looking for unique characters. Could you explain your requirements, along with more examples? but there are so many loop that script's performance is really bad When asking questions about performance, it's useful to provide as much relevant information as possible. How big is your input data? What does "really bad" mean - how long does it take to run? And how quickly do you need it to run? Is the code you're showing really the code you're running? Again, please provide an SSCCE that is representative of what you're actually doing. Until you give more details, I'll just give one hint: If you have two strings of the same length, you can do an XOR operation, as in `$x^$y`, and the resulting string will have NUL bytes (`"\0"`) wherever the strings were identical, and non-NUL bytes where they differ. But again, if this is "better" depends on your actual requirements.	[reply] [d/l] [select]
Re: compare initial by tybalt89 (Monsignor) on Jul 11, 2018 at 22:53 UTC
Assuming the strings have the same length. `#!/usr/bin/perl # https://perlmonks.org/?node_id=1218306 use strict; use warnings; use Algorithm::Diff qw(traverse_balanced); compare( 'TTTATTT', 'TTTTTTT' ); compare( 'TTTTTTT', 'TBTTTTT' ); sub compare { my (@from, @to); print "compare '$_[0]' to '$_[1]', differences are"; traverse_balanced( [ @from = split //, $_[0] ], [ @to = split //, $_ +[1] ], { CHANGE => sub { print " '$from[$_[0]]'\@$_[0]->'$to[$_[1]]'\@$_[1] +" }, } ); print "\n"; }` [download] Outputs: `compare 'TTTATTT' to 'TTTTTTT', differences are 'A'@3->'T'@3 compare 'TTTTTTT' to 'TBTTTTT', differences are 'T'@1->'B'@1` [download]	[reply] [d/l] [select]
Re: compare initial by AnomalousMonk (Archbishop) on Jul 11, 2018 at 14:05 UTC
Further to the hint in haukex's post: The bitwise-xor of equal length strings (update: assumed to be ASCII strings; see sundialsvc4's point here++) is a neat differencing trick. Here's an example that extracts zero-based offsets of differing characters: c:\@Work\Perl\monks>perl -wMstrict -le "my ($x, $y, $z) = qw(TTTATTT TTTTTTT TBTTTTT); ;; for my $ar ([ $x, $y ], [ $x, $z ], [ $y, $z ], ) { my ($p, $q) = @$ar; my $d = $p ^ $q; my @offsets; push @offsets, $+[0]-1 while $d =~ m{ \G \x00* [^\x00] }xmsg; print $p; print $q; if (@offsets) { printf qq{diff(s) at offset(s) %s \n}, join q{, }, @offsets; } else { print qq{no diffs \n}; } } " TTTATTT TTTTTTT diff(s) at offset(s) 3 TTTATTT TBTTTTT diff(s) at offset(s) 1, 3 TTTTTTT TBTTTTT diff(s) at offset(s) 1 [download] BTW: I, too, find your OP a bit vague. Update: ~~Arrrgh! Use `m{ [^\x00] }xmsg` instead of `m{ \G \x00* [^\x00] }xmsg` (simpler == better).~~ Double Arrrgh! Even simpler: Instead of `push @offsets, $+[0]-1 while $d =~ m{ [^\x00] }xmsg;` use `push @offsets, $-[0] while $d =~ m{ [^\x00] }xmsg;` (See perlvar for `@+` and `@-` regex special variables.) Give a man a fish: `<%-{-{-{-<`	[reply] [d/l] [select]
Re^2: compare initial by sundialsvc4 (Abbot) on Jul 11, 2018 at 14:56 UTC
The only possible fly in that ointment would be Unicode (UTF-8,16) because these results are necessarily byte offsets.
Re: compare initial by BillKSmith (Monsignor) on Jul 11, 2018 at 15:14 UTC
The operation that you describe is called "symmetric difference" of sets. A quick search of CPAN finds Set::Tiny. It provides links to several other modules which you may prefer. Bill	[reply]
Re: compare initial by mr_mischief (Monsignor) on Jul 11, 2018 at 21:30 UTC
First, you'll want to use `eq` when testing equality of strings in Perl and `==` when testing equality of numbers. Second, `$a` and `$b` are special variables in Perl used in sorting, so don't name normal things that way. Your specification isn't quite clear. It seems to me you're looking at generations and you want to know when generation n+1 shows a mutation of one single base from generation n. There's https://bioperl.org/ for that sort of work if you need it. There are set libraries as BillKSmith mentioned as well in case that's what you need. As far as your specific case, if you're indeed wanting to compare A to B and B to C but not A to C, I have some working code I threw together for you. Some loops (whether explicit in your code or implicit in the language or in a library beneath your code) are necessary because you're doing repeated checks over different combinations of inputs. One can try to minimize the number of such loops, but loops (possibly mixed with recursion) is how these combinations get checked. If what you actually need is every string tested against every other string then you need even another loop and, yes, it will complete even more slowly given all else stays equal. Forking, threading, IPC, flow-based programming, and similar topics are unnecessary for three strings of seven characters, but enough very long strings might require something other than a simple single-threaded, single process approach to run in an acceptable amount of time. This code is quite particular to the problem as I restated it. There's probably a dozen better ways to do this, and some of those will be more generalizable. #!perl use strict; use warnings; my $difference = 1; my @generations = ( 'TTTATTT', 'TTTTTTTT', 'TBTTTTT' ); my %mismatch; my $times = $difference == 1 ? 'time' : 'times'; my $columns = $difference == 1 ? 'column: ' : "columns:\n"; my $length = length $generations[0]; for ( my $column = 0 ; $column < $length ; $column++ ) { for ( my $gen = 1 ; $gen < scalar @generations ; $gen++ ) { my $last = substr $generations[ $gen-1 ], $column, 1; my $current = substr $generations[ $gen ], $column, 1; push @{ $mismatch{ $gen } }, [ $column, $last, $current ] unle +ss $last eq $current; } } my $out = ''; for my $mm ( sort keys %mismatch ) { if ( scalar @{ $mismatch{ $mm } } == $difference ) { $out .= sprintf "mismatched %d %s between gens %d and %d at %s +", $difference, $times, $mm, $mm-1, $columns; foreach my $c_l_c ( @{ $mismatch{ $mm } } ) { $out .= sprintf " %d ( %s and %s ),\n", @$c_l_c; } $out .= "\n"; } } print $out; [download] If you need the amount of difference to change or the input strings to change, that's not at all difficult. They could come from the command line or configuration files or a data file of some sort.	[reply] [d/l] [select]
Re: compare initial (updated) by AnomalousMonk (Archbishop) on Jul 11, 2018 at 22:20 UTC
I still don't really understand what you want from the OPed code, but here's an approach that produces the same result and which may (or may not) be considered "neater." I've made no attempt to Benchmark this code. Note that the strings need not all be the same length. Also, a `'0'` character (which tests false) is handled properly. `c:\@Work\Perl\monks>perl -wMstrict -le "my $x = 'TTT0TTTTT'; my $y = 'TTTTTTTT'; my $z = 'TBTTTTT'; ;; print qq{only one of '$_'} for one_only($x, $y, $z); ;; sub one_only { my %freq; return grep $freq{$_} == 1, map ++$freq{$_} && $_, map split('', $_), @_ ; } " only one of '0' only one of 'B'` [download] Update: Actually, here's an equivalent variation based on substr that might actually be significantly faster than split-ing every string into countless characters. (Again, I haven't Benchmark-ed.) Read more... (3 kB) Give a man a fish: `<%-{-{-{-<`	[reply] [d/l] [select]


No such thing as a small change
	PerlMonks