Beefy Boxes and Bandwidth Generously Provided by pair Networks
No such thing as a small change
 
PerlMonks  

compare initial

by dideod.yang (Sexton)
on Jul 11, 2018 at 11:33 UTC ( [id://1218306]=perlquestion: print w/replies, xml ) Need Help??

dideod.yang has asked for the wisdom of the Perl Monks concerning the following question:

Hi monks I need your help about perl. I will compare three values. Ther are $a=TTTATTT, $b=TTTTTTT, $c=TBTTTTT. When I compare $a and $b, then I want to catch only "A". Also When I compare $b and $c, then I want to catch "B". Of course below script I can comapre using function "split". but there are so many loop that script's performance is really bad... Can you help me about that issue?? Performance is the best important thing I think.. thank you :)
@a = split(//,$a); @b = split(//,$b); @c = split(//,$c); foreach $a(@a){$test{$a}++} foreach $b(@b){$test($b)++} foreach $c(@c){$test($c)++) foreach $test(keys %test){if($test{$test} eq 1){print "only one : $tes +t\n"}}

Replies are listed 'Best First'.
Re: compare initial
by tybalt89 (Monsignor) on Jul 11, 2018 at 13:26 UTC
    #!/usr/bin/perl # https://perlmonks.org/?node_id=1218306 use strict; use warnings; my $a = 'TTTATTT'; my $b = 'TTTTTTT'; my $c = 'TBTTTTT'; print "compare $a to $b, difference is ", $a =~ s/[$b]+//gr, $b =~ s/[$a]+//gr, "\n"; print "compare $b to $c, difference is ", $b =~ s/[$c]+//gr, $c =~ s/[$b]+//gr, "\n";

    Outputs:

    compare TTTATTT to TTTTTTT, difference is A compare TTTTTTT to TBTTTTT, difference is B
      Now that’s slick ...   “++
Re: compare initial
by haukex (Archbishop) on Jul 11, 2018 at 12:05 UTC

    The code you've shown does not compile, and is hard to read. Please take a look at SSCCE, perltidy, and Use strict and warnings.

    When I compare $a and $b, then I want to catch only "A". Also When I compare $b and $c, then I want to catch "B".

    But that's not really what your code is doing - it's bunching all the characters from all the strings together into one hash, and then looking for unique characters. Could you explain your requirements, along with more examples?

    but there are so many loop that script's performance is really bad

    When asking questions about performance, it's useful to provide as much relevant information as possible. How big is your input data? What does "really bad" mean - how long does it take to run? And how quickly do you need it to run? Is the code you're showing really the code you're running? Again, please provide an SSCCE that is representative of what you're actually doing.

    Until you give more details, I'll just give one hint: If you have two strings of the same length, you can do an XOR operation, as in $x^$y, and the resulting string will have NUL bytes ("\0") wherever the strings were identical, and non-NUL bytes where they differ. But again, if this is "better" depends on your actual requirements.

Re: compare initial
by tybalt89 (Monsignor) on Jul 11, 2018 at 22:53 UTC

    Assuming the strings have the same length.

    #!/usr/bin/perl # https://perlmonks.org/?node_id=1218306 use strict; use warnings; use Algorithm::Diff qw(traverse_balanced); compare( 'TTTATTT', 'TTTTTTT' ); compare( 'TTTTTTT', 'TBTTTTT' ); sub compare { my (@from, @to); print "compare '$_[0]' to '$_[1]', differences are"; traverse_balanced( [ @from = split //, $_[0] ], [ @to = split //, $_ +[1] ], { CHANGE => sub { print " '$from[$_[0]]'\@$_[0]->'$to[$_[1]]'\@$_[1] +" }, } ); print "\n"; }

    Outputs:

    compare 'TTTATTT' to 'TTTTTTT', differences are 'A'@3->'T'@3 compare 'TTTTTTT' to 'TBTTTTT', differences are 'T'@1->'B'@1
Re: compare initial
by AnomalousMonk (Archbishop) on Jul 11, 2018 at 14:05 UTC

    Further to the hint in haukex's post:   The bitwise-xor of equal length strings (update: assumed to be ASCII strings; see sundialsvc4's point here++) is a neat differencing trick. Here's an example that extracts zero-based offsets of differing characters:

    c:\@Work\Perl\monks>perl -wMstrict -le "my ($x, $y, $z) = qw(TTTATTT TTTTTTT TBTTTTT); ;; for my $ar ([ $x, $y ], [ $x, $z ], [ $y, $z ], ) { my ($p, $q) = @$ar; my $d = $p ^ $q; my @offsets; push @offsets, $+[0]-1 while $d =~ m{ \G \x00* [^\x00] }xmsg; print $p; print $q; if (@offsets) { printf qq{diff(s) at offset(s) %s \n}, join q{, }, @offsets; } else { print qq{no diffs \n}; } } " TTTATTT TTTTTTT diff(s) at offset(s) 3 TTTATTT TBTTTTT diff(s) at offset(s) 1, 3 TTTTTTT TBTTTTT diff(s) at offset(s) 1
    BTW: I, too, find your OP a bit vague.

    Update: Arrrgh! Use  m{ [^\x00] }xmsg instead of  m{ \G \x00* [^\x00] }xmsg (simpler == better).
    Double Arrrgh! Even simpler: Instead of
        push @offsets, $+[0]-1 while $d =~ m{ [^\x00] }xmsg;
    use
        push @offsets, $-[0] while $d =~ m{ [^\x00] }xmsg;
    (See perlvar for  @+ and  @- regex special variables.)


    Give a man a fish:  <%-{-{-{-<

      The only possible fly in that ointment would be Unicode (UTF-8,16) because these results are necessarily byte offsets.
Re: compare initial
by BillKSmith (Monsignor) on Jul 11, 2018 at 15:14 UTC
    The operation that you describe is called "symmetric difference" of sets. A quick search of CPAN finds Set::Tiny. It provides links to several other modules which you may prefer.
    Bill
Re: compare initial
by mr_mischief (Monsignor) on Jul 11, 2018 at 21:30 UTC

    First, you'll want to use eq when testing equality of strings in Perl and == when testing equality of numbers. Second, $a and $b are special variables in Perl used in sorting, so don't name normal things that way.

    Your specification isn't quite clear. It seems to me you're looking at generations and you want to know when generation n+1 shows a mutation of one single base from generation n. There's https://bioperl.org/ for that sort of work if you need it. There are set libraries as BillKSmith mentioned as well in case that's what you need.

    As far as your specific case, if you're indeed wanting to compare A to B and B to C but not A to C, I have some working code I threw together for you. Some loops (whether explicit in your code or implicit in the language or in a library beneath your code) are necessary because you're doing repeated checks over different combinations of inputs. One can try to minimize the number of such loops, but loops (possibly mixed with recursion) is how these combinations get checked. If what you actually need is every string tested against every other string then you need even another loop and, yes, it will complete even more slowly given all else stays equal. Forking, threading, IPC, flow-based programming, and similar topics are unnecessary for three strings of seven characters, but enough very long strings might require something other than a simple single-threaded, single process approach to run in an acceptable amount of time.

    This code is quite particular to the problem as I restated it. There's probably a dozen better ways to do this, and some of those will be more generalizable.

    #!perl use strict; use warnings; my $difference = 1; my @generations = ( 'TTTATTT', 'TTTTTTTT', 'TBTTTTT' ); my %mismatch; my $times = $difference == 1 ? 'time' : 'times'; my $columns = $difference == 1 ? 'column: ' : "columns:\n"; my $length = length $generations[0]; for ( my $column = 0 ; $column < $length ; $column++ ) { for ( my $gen = 1 ; $gen < scalar @generations ; $gen++ ) { my $last = substr $generations[ $gen-1 ], $column, 1; my $current = substr $generations[ $gen ], $column, 1; push @{ $mismatch{ $gen } }, [ $column, $last, $current ] unle +ss $last eq $current; } } my $out = ''; for my $mm ( sort keys %mismatch ) { if ( scalar @{ $mismatch{ $mm } } == $difference ) { $out .= sprintf "mismatched %d %s between gens %d and %d at %s +", $difference, $times, $mm, $mm-1, $columns; foreach my $c_l_c ( @{ $mismatch{ $mm } } ) { $out .= sprintf " %d ( %s and %s ),\n", @$c_l_c; } $out .= "\n"; } } print $out;

    If you need the amount of difference to change or the input strings to change, that's not at all difficult. They could come from the command line or configuration files or a data file of some sort.

Re: compare initial (updated)
by AnomalousMonk (Archbishop) on Jul 11, 2018 at 22:20 UTC

    I still don't really understand what you want from the OPed code, but here's an approach that produces the same result and which may (or may not) be considered "neater." I've made no attempt to Benchmark this code. Note that the strings need not all be the same length. Also, a  '0' character (which tests false) is handled properly.

    c:\@Work\Perl\monks>perl -wMstrict -le "my $x = 'TTT0TTTTT'; my $y = 'TTTTTTTT'; my $z = 'TBTTTTT'; ;; print qq{only one of '$_'} for one_only($x, $y, $z); ;; sub one_only { my %freq; return grep $freq{$_} == 1, map ++$freq{$_} && $_, map split('', $_), @_ ; } " only one of '0' only one of 'B'

    Update: Actually, here's an equivalent variation based on substr that might actually be significantly faster than split-ing every string into countless characters. (Again, I haven't Benchmark-ed.)


    Give a man a fish:  <%-{-{-{-<

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1218306]
Approved by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others surveying the Monastery: (4)
As of 2024-04-25 13:35 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found