Beefy Boxes and Bandwidth Generously Provided by pair Networks
Clear questions and runnable code
get the best and fastest answer
 
PerlMonks  

A (non) reg-ex question

by Ido (Hermit)
on Mar 20, 2006 at 12:42 UTC ( [id://537930]=perlquestion: print w/replies, xml ) Need Help??

Ido has asked for the wisdom of the Perl Monks concerning the following question:

How could we match the language 0{n}1{n}?
I came up with :
 /^(0*)(?{$i=length($1)})(??{"1{$i}"})$/
, but it's quiet ugly, do you have any alternatives?

Also, what if I only wanted the number of 0s and the number of 1s to be equal?

Replies are listed 'Best First'.
Re: A (non) reg-ex question
by hv (Prior) on Mar 20, 2006 at 14:16 UTC

    Here's one approach:

    !/[^01]/ && !/10/ && ($_ ^ reverse $_) !~ /[^\001]/

    That is: a) check that there are no characters that are not 0 or 1, b) check that no 1s precede 0s, c) XOR the string with its reverse - it should give chr(1) for every character.

    Another approach, this time destructive:

    1 while s/(?<!1)01(?!0)//g; length($_) == 0;

    That is: delete "01" from the centre as often as you can, and verify that this deletes all characters from the string. The lookarounds guard against deleting from 101 or 010.

    And another, using a single recursive regexp:

    our $re; $re = qr{(?:0(??{$re})1)?}; /^$re\z/;

    That is: MATCH := "" | "0" MATCH "1".

    Update: added missing '/' in s///, per johngg.

    Hugo

      For amusement, here is the recursive regexp modified to match the case where there are as many 1's as 0's (in any order):
      my $re; $re = qr{ (?: 0 # 0 (??{$re}) # Balanced 1's and 0's 1 # 1 | 1 # 1 (??{$re}) # Balanced 1's and 0's 0 # 0 )* }x; /^$re\z/;
      A random note. For amusement I tried to optimize this, but that failed. I believe the following should be fast, but it isn't, and that's probably a bug (in my understanding, or in the RE engine):
      my $re; $re = qr{ (?: 0 # 0 (?> # If I get here, do not backtrack! (??{$re}) # Balanced 1's and 0's ) 1 # 1 | 1 # 1 (?> # If I get here, do not backtrack! (??{$re}) # Balanced 1's and 0's ) 0 # 0 )* }x; for (qw(1001 11100 000111 00111), "101"x50) { if (/^$re\z/) { print "$_ matched\n"; } else { print "$_ did not match\n"; } }
      Given the performance problems, I would not recommend using this regular expression in real life.

      UPDATE: It was a bug in my understanding. The following works and is fast:

      my $re; $re = qr{ (?: 0 # 0 (??{$re}) # Balanced 1's and 0's 1 # 1 | 1 # 1 (?> # If I get here, do not backtrack! (??{$re}) # Balanced 1's and 0's ) 0 # 0 )*? }x; for (qw(1001 11100 000111 00111), "101"x50) { if (/^$re\z/) { print "$_ matched\n"; } else { print "$_ did not match\n"; } }
      And so is this:
      my $re; $re = qr{ (?> # Avoid backtracking 0 # 0 (??{$re}) # Balanced 1's and 0's 1 # 1 | 1 # 1 (??{$re}) # Balanced 1's and 0's 0 # 0 )*? }x; for (qw(1001 11100 000111 00111), "101"x50) { if (/^$re\z/) { print "$_ matched\n"; } else { print "$_ did not match\n"; } }
      The key is that you need to avoid having to backtrack on success.
      I think you may have missed the last "/" in your global substitute.

      Cheers,

      JohnGG

      A supplementary reply, this time in the form of a question.

      I am trying to understand your third method that uses the recursive regular expression. What I think it is doing is finding balanced "01" pairs in the same way as the Camel book example finds balanced "()" pairs. What I want to know is, how does the regular expression know when to stop recursing? Is the "?" quantifier something to do with it? It seems to me that the regular expression is consuming characters from both ends of the string at the same time. Am I missing something obvious?

      Cheers,

      JohnGG

        I think my edition of the Camel predates qr{} so I doubt I have that example to hand, but I'd expect it to be much the same except that balanced parens can usually look like "()(())(()())" and the like.

        Except for certain special optimisations, a regular expression is always traversed from left to right. The critical thing to stop infinite recursion is to ensure that at least one character must be matched each time before recursing, ie that there be something to the left of the nested $re.

        In this case, ($re = qr{(?:0(??{$re})1)?}), we require a '0' to be matched before recursing. So "0011" =~ /^$re\z/ will be executed like:

        anchor to start try $re match '0' try $re match '0' try $re fail to find '0' group is optional, matched zero times $re matched '' match '1' group is optional, matched one time $re matched '01' match '1' group is optional, matched one time $re matched '0011' anchor to end match complete: matched '0011' from start

        Similarly, you could (foolishly) replace /0+/ with a recursive regexp like:

        $re = qr{0(??{$re})?};
        which would work, but the following would fall into infinite recursion:
        $re = qr{(??{$re})?0};

        Hope this helps,

        Hugo

Re: A (non) reg-ex question
by salva (Canon) on Mar 20, 2006 at 13:03 UTC
    How could we match the language 0{n}1{n}?

    If you already know that $_ contains '0' and '1' chars only:

    tr/0// == tr/1// and index $_, '10' < 0
    else:
    my $hl = (1+length) >> 1; tr/0// == $hl and tr/1// == $hl and index $_, '10' < 0
    or just
    /^(0*)(1*)$/ and length $1 == length $2

    what if I only wanted the number of 0s and the number of 1s to be equal?

    tr/0// == tr/1//
Re: A (non) reg-ex question
by mickeyn (Priest) on Mar 20, 2006 at 14:10 UTC
    how about:
    $str =~ /(0+)(1+)/; $is0n1n = length($1)<=>length($2); # 0 if match

    Enjoy,
    Mickey

Re: A (non) reg-ex question
by holli (Abbot) on Mar 20, 2006 at 13:14 UTC
    but it's quiet ugly, do you have any alternatives?
    Regexes are ugly by design ;-)


    holli, /regexed monk/
Re: A (non) reg-ex question
by timos (Beadle) on Mar 20, 2006 at 12:59 UTC
    Maybe:
    my @foo = split /01/"00001111"; $foo[0]=~s/0/1/; if ($foo[0] eq $foo[1]) {print "String is OK\n";}
      Mm, I forgot to test if
      "00001111"=~/^[0]+[1]+$/;
Re: A (non) reg-ex question
by exussum0 (Vicar) on Mar 20, 2006 at 22:33 UTC
    You can't. Yes you can. You got the implemented answer. Just a blurb on why you can't, but perl breaks the rule.

    The long, "you can't," answer: A regexp, an achedemic regexp, won't work. I point that out since your example is a perl regular expression. I suggest looking up the pumping lemma and the basic tenants of a regular language. In short, what you need is something that requires stack memory. A real regular expression only needs memory for the input, the expression itself, and a pointer to say where weare at in the regexp. If the number of 0's and 1's have some sort of arithmetic relation, more likely than not, you can't. There are exceptions.

    The long, "you can," answer: perl's regexp's are an extension of the regular expressions used for regular languages. The fact you have the /e flag totally completely blows it out of the water since oyu can freely allocate variables. Perl's regex's are a wonderful thing. Just stating the fact you can do context sensitive stuff in it.

      The /e flag does not work on a regex. Instead, it's a flag that says the right hand side of a s///, which is a string, needs to be treated as code. The regex is the left hand side in s/// (or the whole thing in //).

      Surely you must have been thinking of (?{CODE}) and friends? (See perlre)

        Yeah, thanks. Whoops.
Re: A (non) reg-ex question
by codeacrobat (Chaplain) on Mar 20, 2006 at 20:54 UTC
    Another way would be to use Parse::RecDescent.
    #!/usr/bin/perl -w use strict; use Parse::RecDescent; my $parser = new Parse::RecDescent (q{ L: '0' L '1' L: '' }); my $string1 = "00111"; my $string2 = "0011"; print $parser->L(\$string1) && $string1 eq '' ? "" : "not ", "in L\n"; + # not in L print $parser->L(\$string2) && $string2 eq '' ? "" : "not ", "in L\n"; + # in L
    I am a Parse::RecDescent newbie. Does someone know to write the evaluation in a better way?
Re: A (non) reg-ex question
by Anonymous Monk on Mar 20, 2006 at 16:00 UTC
    Assuming the language consists of 1s and 0s...

    sub belongs_to_L { my ($s) = @_; my ($num_zeros,$num_ones,$junk); ($num_zeros,$num_ones,$junk) = ( $s =~ m/^(0*)(1*)(.*)/); return 1 if ( length($zeros) == length($ones) and length($junk) == 0); return 0; }

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://537930]
Approved by bart
Front-paged by grinder
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others making s'mores by the fire in the courtyard of the Monastery: (7)
As of 2024-04-18 14:49 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found