Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?
 
PerlMonks  

Regex Semantics

by cbro (Pilgrim)
on Oct 04, 2006 at 21:03 UTC ( [id://576410]=perlquestion: print w/replies, xml ) Need Help??

cbro has asked for the wisdom of the Perl Monks concerning the following question:

***RESOLVED: Thank you everybody for your replies/help. Also, thanks for dealing with the stupid example :P The point was understanding how the anchors and '*' were interacting, and everybody's responses helped clarify. Much obliged.*** Good Afternoon, I'm finding myself in a semantics battle against myself, and I'm losing. I'm trully hoping somebody can smack some sense into me here. I speak the regex /^[a]*$/ as, "A line beginning and ending with zero or more characters in the set {a}". With that, I understand how an empty string will match e.g.
$foo = ""; if ($foo =~ /^[a]*$/) { print "Match\n"; }
...prints "Match". So, if this regex does translate to, "A line beginning and ending with zero or more letters in the set/class {a}"...then why doesn't "Match" get printed if I set
$foo = "1";
And to finalize my question (which is all about not TRULY understanding how anchors are changing the semantics), why will
$foo = "1"; if ($foo =~ /^[a]*/) { print "Match\n"; }
.....print "Match". What are these anchors doing in combination with the '*'? Thanks in advance for any assistance. Chris

Replies are listed 'Best First'.
Re: Regex Semantics
by ikegami (Patriarch) on Oct 04, 2006 at 21:12 UTC

    /^[a]*$/ reads as
    1) Match the start of the string,
    2) immediately followed by 0 or more 'a',
    3) immediately followed by "\n" (optional) and the end of the string.

    When matching against "1", step (3) fails.
    1) Match the start of the string -> ok, pos = 0
    2) immediately followed by 0 or more 'a' -> ok (matched 0 'a's), pos = 0
    3) immediately followed by "\n" (optional) and the end of the string -> fail (no "\n" or eos at pos 0)

    /^[a]*/ reads as:
    1) Match the start of the string,
    2) immediately followed by 0 or more 'a'.

    When matching against "1", it matches.
    1) Match the start of the string -> ok, pos = 0
    2) immediately followed by 0 or more 'a' -> ok (matched 0 'a's), pos = 0

    U: Changed "followed" to "immediately followed" for extra clarity.
    U: Added '"\n" (optional) and'.

Re: Regex Semantics
by wazzuteke (Hermit) on Oct 04, 2006 at 21:11 UTC
    Basically, the regex says 'May or may not start with 'a', but otherwise I don't care'. In other words, this regex will evaluate to TRUE with any value you pass it, since '*' signifies 'zero or many'.

    If you are looking for a regex that says 'Must start with the character class I specify, otherwise fail', you will need to substitute the '*' for a '+', where the '+' implies 'one or many' rather than 'zero or many'.

    Therefore:
    $foo = "1"; if ( $foo =~ /^[a]+/ ) { print "Match\n"; }
    You will see the difference.

    Also, a good resource for these types of questions are in the Perl documentation. For Perl Regular Expressions, try 'perldoc -f perlre' from the command-line, or Perlmonks has the page as well.

    ---------
    perl -le '$.=[qw(104 97 124 124 116 97)];*p=sub{[@{$_[0]},(45)x 2]};*d=sub{[(45)x 2,@{$_[0]}]};print map{chr}@{p(d($.))}'
Re: Regex Semantics
by GrandFather (Saint) on Oct 04, 2006 at 21:16 UTC

    /^[a]*$/ matches a string containing 0 or more 'a' characters which may be followed by a new line character.

    Note that the character class is redundant. It could equally well be written /^a*$/.

    Consider:

    use strict; use warnings; my @strings = ('', '1', 'a', "a\n"); /^[a]*$/ && print "/^[a]*\$/ matches >$_<\n" for @strings; /^a*$/ && print "/^a*\$/ matches >$_<\n" for @strings;

    Prints:

    /^[a]*$/ matches >< /^[a]*$/ matches >a< /^[a]*$/ matches >a < /^a*$/ matches >< /^a*$/ matches >a< /^a*$/ matches >a <

    DWIM is Perl's answer to Gödel
Re: Regex Semantics
by diotalevi (Canon) on Oct 04, 2006 at 21:13 UTC

    The regexp ^a*$ says that there may be zero or more a characters followed by an optional newline and then the end of the string. When $foo is '1' then the end of the string is tested before matching the '1'. Your pattern succeeded when you omitted $ because you didn't require that the you'd gotten to the end of the string. '1' =~ /^a*/ succeeds with no characters matched like /^/ while 'a' =~ /^a*$/ succeeds with all three parts matched: /^a*$/.

    1. ^: Matched the start of the string
    2. a*: Matched nothing
    3. $: Failed because we haven't gotten to the end of the string

    ⠤⠤ ⠙⠊⠕⠞⠁⠇⠑⠧⠊

Re: Regex Semantics
by ysth (Canon) on Oct 04, 2006 at 21:58 UTC
    So, a regex that actually matched "A line beginning and ending with zero or more letters in the set/class {a}" would be /^[a]*.*[a]$/s: The beginning of the string has to be followed by zero or more a's, followed by zero or more things in the middle of the string, followed by zero or more a's before the end of the string (or before a newline preceding the end). But with that "zero" in there instead of "one" or more, you are only requiring that the string have a beginning and an end, which is true of all strings.

      So, a regex that actually matched "A line beginning and ending with zero or more letters in the set/class {a}" would be

      I didn't touch that in my post because both /^.*\z/s and 1 would be sufficient to match "A line beginning and ending with zero or more letters in the set/class {a}".

      If you wanted a regexp that matched "A line beginning and ending with one or more letters in the set/class {a}", you'd use
      /^[a]/ && /[a]\z/ (Two regexps),
      /^(?=[a])(?=.*[a]\z)/s (Lookaheads) or
      /^[a].*(?<=[a])\z)/s (One of the few times (?<=...) works.)

      /^[a].*[a]\z/s would not do, since it wouldn't match "a".

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://576410]
Approved by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others scrutinizing the Monastery: (4)
As of 2024-04-25 21:35 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found