Beefy Boxes and Bandwidth Generously Provided by pair Networks
No such thing as a small change
 
PerlMonks  

How does one get all possible matches from regex?

by supriyoch_2008 (Monk)
on Dec 10, 2013 at 03:08 UTC ( [id://1066361]=perlquestion: print w/replies, xml ) Need Help??

supriyoch_2008 has asked for the wisdom of the Perl Monks concerning the following question:

Hi Perlmonks,

I am interested to get all the possible matches from a string using regex. But I have got fewer matches than expected with the perl script g1.pl. I have given the script, the incorrect results and the expected results below. I request perlmonks to provide suggestions how to write the regex to get the correct results.

Here goes the script:

#!/usr/bin/perl use warnings; $seq="TT TATAAT CGCG ATG CAG GAG TGG TAA TGA TAG CC TGA TATAAT CCC A +TG CTA CAT TGA TT"; $seq=~ s/\s//gs; while ($seq=~ /([AG]TG).*?(TAA|TAG|TGA)+?/gs) { my $match=$&; $match=~ s/\s//g; push @matches,$match;} print"\n Matches are:\n\n"; print join ("\n",@matches); print"\n\n"; exit;

I have got the incorrect results like:

C:\Users\Dr Supriyo>cd desktop C:\Users\Dr Supriyo\Desktop>g1.pl Matches are: ATGCAGGAGTGGTAA ATGCTACATTGA

The correct results should be:

Matches are:
ATGCAGGAGTGGTAA ATGCAGGAGTGGTAATGA ATGCAGGAGTGGTAATGATAG ATGCAGGAGTGGTAATGATAGCCTGA ATGCTACATTGA

Replies are listed 'Best First'.
Re: How does one get all possible matches from regex?
by educated_foo (Vicar) on Dec 10, 2013 at 03:31 UTC
    Your "correct results" don't correspond to what I get; if you want "all possible matches for REGEX," you should use this:
    1 while /REGEX(?{print $&})(?!)/;
    i.e. "match REGEX, print what it matched, then fail."
      I didnt understand the OP, but is this not easier?

       print $1 while /(REGEX)/g

      Cheers Rolf

      ( addicted to the Perl Programming Language)

        They're different: /X/g does non-overlapping matches; /X(?{print$&})(?!)/ does all matches. It depends what you want.
Re: How does one get all possible matches from regex? (combinations permutations)
by Anonymous Monk on Dec 10, 2013 at 03:58 UTC

    Try Regexp::Exhaustive - Find all possible matches, including backtracked and overlapping, of a pattern against a string

Re: How does one get all possible matches from regex?
by 2teez (Vicar) on Dec 10, 2013 at 06:13 UTC

    You don't want to be using '$&' in your regex because "..once Perl sees that you need one of $`, $&, or $' anywhere in the program, it provides them for every pattern match. This will slow down your program a bit..." -- Programming Perl
    See also:Why does using $&, $`, or $' slow my program down?
    Just my 2 kobo advice.

    If you tell me, I'll forget.
    If you show me, I'll remember.
    if you involve me, I'll understand.
    --- Author unknown to me
      Is this still a sound advice to give these days? The doc that you link to says:
      As of the 5.005 release, the $& variable is no longer "expensive" the way the other two are.
      Why Is $& Bad? gives a better technical explanation why one might want to avoid $&. Even so, the issue with $& is perhaps more of a concern for core module writers. Can you point to some recent benchmarks?

      On a related note, does study() actually do anything in more recent perls?

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1066361]
Approved by boftx
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others avoiding work at the Monastery: (7)
As of 2024-04-18 17:57 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found