Beefy Boxes and Bandwidth Generously Provided by pair Networks
Do you know where your variables are?
 
PerlMonks  

Difference between these regexes

by haukex (Archbishop)
on Jul 29, 2016 at 12:36 UTC ( [id://1168807]=perlquestion: print w/replies, xml ) Need Help??

haukex has asked for the wisdom of the Perl Monks concerning the following question:

Fellow Monks,

I seek your wisdom: A recent node (Re: No tools? Use Perl?!) got me wondering: Is there a difference between these three regexes that I am missing, primarily in terms of what they match? In particular I'm interested in the first two - I understand the first and third regexes, but am not an expert on look-arounds.

  1. /<(.*?)>/s
  2. /<((?:(?!>).)*)>/s
  3. /<([^>]*)>/s

I have yet to find a difference - I am not sure if there even is one - but I'm probably not being creative enough in my test cases...

#!/usr/bin/env perl use warnings; use strict; use Test::More; #use re 'debug'; sub testre (_) { my $txt = shift; my @m1 = $txt =~ /<(.*?)>/sg; my @m2 = $txt =~ /<((?:(?!>).)*)>/sg; my @m3 = $txt =~ /<([^>]*)>/sg; is_deeply \@m1, \@m2, "$txt => (@m1) = (@m2)"; is_deeply \@m1, \@m3, "$txt => (@m1) = (@m3)"; } testre for ( "<", "<<", "<<<", ">", ">>", ">>>", "<<<>", "<<>", "<>", "<>>", "<>>>", "<<>>", "<<<>>>", "<><>", "<><><>", "<><<><>", "<><>><>", "a<b>c", "a<b>>c", "a<b>>>c", "a<<b>c", "a<<<b>c", "a<<b>>c", "a<<<b>>>c", "a<b>c<d>e", "a<b>c<d>e<f>g", "a<b>c<<e>f<g>h", "a<b>c<d>>e<f>g", "a<b>c<e<f>g<h>i", "a<b>c<d>e>f<g>h", "<\n>\n", "<\n<\n>\n>\n", "<\n>\n<\n>\n", ); done_testing;

Regards,
-- Hauke D

Replies are listed 'Best First'.
Re: Difference between these regexes (elaboration)
by tye (Sage) on Jul 29, 2016 at 14:32 UTC

    2 and 3 are equivalent (and I'll bet that 2 is less efficient). 1 is also equivalent as-is. However, if you were to use these constructs as part of a larger regex, then 1 would not always be equivalent.

    #!/usr/bin/perl -l $_ = '<left>no<right>yes'; /<(.*?)>y/s; print $1; /<([^>]*)>y/s; print $1; __END__ left>no<right right

    - tye        

Re: Difference between these regexes
by talexb (Chancellor) on Jul 29, 2016 at 13:22 UTC

    I'd recommend this fine paper as a starting point for your investigations. I would continue with Regexp::Debugger, which is more recent.

    And I like choice 3 in your original example, although the star suggests you might have '<>' in your input, which I would guess is unlikely. If the tag's empty, there's not much point in outputting it.

    Alex / talexb / Toronto

    Thanks PJ. We owe you so much. Groklaw -- RIP -- 2003 to 2013.

Re: Difference between these regexes
by Eily (Monsignor) on Jul 29, 2016 at 13:16 UTC

    The three cases will indeed match a string from < to the first > on its right, for different reasons. Either because the ? modifier can be understood to mean "as soon as a match is found, from left to right" and therefore find "the first match", or because no > can be contained in the capture. (?!>). and [^>] match the same thing under /s.

Re: Difference between these regexes
by Laurent_R (Canon) on Jul 29, 2016 at 13:09 UTC
    Hi Hauke,

    I am fairly sure that 1 and 3 are equivalent (although written differently). I have been using one or the other syntax many times.

    I would *think* that 2 is also doing the same, but there might be some edge case where it differs from the other two (although I can't think of any at the moment).

Re: Difference between these regexes
by haukex (Archbishop) on Jul 30, 2016 at 10:49 UTC

    Thank you very much, everyone who replied!

    Taking all the replies together I think that confirms that the set of strings matched is the same for all three of these particular regexes.

    This also confirms my initial suspicion that the regex m{<ReportHost[^>]*>(?:(?!</ReportHost>).)*</ReportHost>}s could also be written as m{<ReportHost[^>]*>.*?</ReportHost>}s.

    Regards,
    -- Hauke D

      There are many ways and tools to figure out a regexp. One of them is: http://www.regexr.com/ imho. You can mouse over the regexp to see what it's looking for, and in the result panel you can see the range, group, etc. You might want to try it for the next quest.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1168807]
Approved by stevieb
Front-paged by kcott
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others romping around the Monastery: (2)
As of 2024-04-25 19:28 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found