http://qs321.pair.com?node_id=487086

monkfan has asked for the wisdom of the Perl Monks concerning the following question:

I am trying to put the bracket in a string given the set of its substrings. Like this:
String Substrings Desired Result 1.CCCATCTGTCCTTATTTGCTG ATCTG ATTTG CCC[ATCTG]TCCTT[ATTTG]CTG 2.ACCCATCTGTCCTTGGCCAT CCATC AC[CCATC]TGTCCTTGGCCAT 3.CCACCAGCACCTGTC CCACC CCAGC GCACC [CCACCAGCACC]TGTC 4.CCCAACACCTGCTGCCT CCAAC ACACC C[CCAACACC]TGCTGCCT

Originally posted as a Categorized Question.

Replies are listed 'Best First'.
Re: How do I put bracket on substring(s) of a string?
by monkfan (Curate) on Aug 27, 2005 at 03:27 UTC
    This is an attempt to save a thread in SoPW which I thought can be useful for others.

    There are already incredible answers in the thread, so feel free to check in SoPW. Please don't misunderstand me. I honestly think this can be useful. I don't mean to boost my XP here, or in that thread which I posted.

    One possible answers is given by good sgifford

    #!/usr/bin/perl -w use strict; my $s1 ='CCCATCTGTCCTTATTTGCTG'; my @a1 = qw(ATCTG ATTTG); my $s2 ='ACCCATCTGTCCTTGGCCAT'; my @a2 = qw(CCATC); my $s3 ='CCACCAGCACCTGTC'; my @a3 = qw(CCACC CCAGC GCACC); my $s4 ='CCCAACACCTGCTGCCT'; my @a4 = qw(CCAAC ACACC); put_bracket($s1,\@a1); put_bracket($s2,\@a2); put_bracket($s3,\@a3); put_bracket($s4,\@a4); sub put_bracket { my( $str, $ar ) = @_; foreach my $subs ( @$ar ) { # Construct a regexp with [\[\]] between all the letters my $newsub = join '[\[\]]?', split //, $subs; $str =~ s/($newsub)/[$1]/g; } # Now de-nest the brackets in the string my $depth = 0; my $newstr = ''; foreach my $c ( split //, $str ) { if ( $c eq '[' ) { $newstr .= $c if $depth++ == 0; } elsif ( $c eq ']' ) { $newstr .= $c if --$depth == 0; } else { $newstr .= $c; } } return $newstr; }
    More can be found here. Don't vote my root posting but vote those remarkable answers you will find there.
Re: How do I put bracket on substring(s) of a string?
by Roy Johnson (Monsignor) on Sep 10, 2008 at 17:49 UTC
    #!perl use strict; use warnings; my @testdata = (['CCCATCTGTCCTTATTTGCTG', [qw(ATCTG ATTTG)]] ,['ACCCATCTGTCCTTGGCCAT', [qw(CCATC)]] ,['CCACCAGCACCTGTC', [qw(CCACC CCAGC GCACC)]] ,['CCCAACACCTGCTGCCT', [qw(CCAAC ACACC)]]); for (@testdata) { my ($str, $pats) = @$_; print put_bracket($str, @$pats), "\n"; } sub put_bracket { my $str = shift; # Combine multiple match strings into alternations my $rx = join '|', @_; my @brackets; # Store bracket points for every match while ($str =~ /(?=($rx))/g) { push(@brackets, [$-[0], length($1)+$-[0]]); } # Condense overlapping brackets for my $i (0..$#brackets-1) { if ($brackets[$i][1] >= $brackets[$i+1][0]) { $brackets[$i+1][0] = $brackets[$i][0]; @{$brackets[$i]} = (); } } # Apply the brackets (from back to front) while (@brackets) { my $b = pop @brackets; next unless @$b; substr($str, $b->[1], 0) = ']'; substr($str, $b->[0], 0) = '['; } return $str; }
Re: How do I put bracket on substring(s) of a string?
by jdporter (Paladin) on Sep 10, 2008 at 19:55 UTC
    sub put_brackets { my( $str, @pats ) = @_; my $mask = ' ' x length $str; my $ofs = 0; for ( @pats ) { $ofs = index $str, $_, $ofs; last if $ofs < 0; # this pattern not found; look for no more. substr $mask, $ofs, length($_), 'a' x length($_); } # print "string= '$str'\n"; # print "mask = '$mask'\n"; join '', map sprintf( /a/ ? '[%s]' : '%s', substr $str, 0, length($_), '' ), split /\b/, $mask }
Re: How do I put bracket on substring(s) of a string?
by aspect_khaliq (Novice) on Apr 04, 2007 at 11:47 UTC
    #!/usr/bin/perl -w use strict; my $a1="CCCATCTGTCCTTATTTGCTG"; my $a2="ACCCATCTGTCCTTGGCCAT"; my $a3="CCACCAGCACCTGTC"; my $a4="CCCAACACCTGCTGCCT"; $a1=~s/(ATCTG|ATTTG)/\[$1\]/g; $a2=~s/(CCATC)/\[$1\]/g; $a3=~s/(CCACC|CCAGC|GCACC)/\[$1\]/g; $a4=~s/(CCAAC|ACACC)/\[$1\]/g; print "$a1\n$a2\n$a3\n$a4\n";

    Originally posted as a Categorized Answer.

Re: How do I put bracket on substring(s) of a string?
by sathiya.sw (Monk) on Sep 10, 2008 at 08:53 UTC
    use strict; use warnings; my $s1 ='CCCATCTGTCCTTATTTGCTG'; my @a1 = qw(ATCTG ATTTG); my $s2 ='ACCCATCTGTCCTTGGCCAT'; my @a2 = qw(CCATC); my $s3 ='CCACCAGCACCTGTC'; my @a3 = qw(CCACC CCAGC GCACC); my $s4 ='CCCAACACCTGCTGCCT'; my @a4 = qw(CCAAC ACACC); put_bracket($s1, @a1); put_bracket($s2, @a2); put_bracket($s3, @a3); put_bracket($s4, @a4); sub put_bracket { my $str = shift; $str =~ s/$_/[$&]/ for @_; return $str }
      When I print the string returned by each of your put_bracket calls, the output does not match the required output for 3 and 4:
      CCC[ATCTG]TCCTT[ATTTG]CTG AC[CCATC]TGTCCTTGGCCAT [CCACC]A[GCACC]TGTC C[CCAAC]ACCTGCTGCCT