A (non) reg-ex question

Ido has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: A (non) reg-ex question by hv (Prior) on Mar 20, 2006 at 14:16 UTC
Here's one approach: `!/[^01]/ && !/10/ && ($_ ^ reverse $_) !~ /[^\001]/` [download] That is: a) check that there are no characters that are not 0 or 1, b) check that no 1s precede 0s, c) XOR the string with its reverse - it should give chr(1) for every character. Another approach, this time destructive: `1 while s/(?<!1)01(?!0)//g; length($_) == 0;` [download] That is: delete "01" from the centre as often as you can, and verify that this deletes all characters from the string. The lookarounds guard against deleting from 101 or 010. And another, using a single recursive regexp: `our $re; $re = qr{(?:0(??{$re})1)?}; /^$re\z/;` [download] That is: `MATCH := "" \| "0" MATCH "1"`. Update: added missing '/' in s///, per johngg. Hugo	[reply] [d/l] [select]
Re^2: A (non) reg-ex question by tilly (Archbishop) on Mar 21, 2006 at 04:36 UTC
For amusement, here is the recursive regexp modified to match the case where there are as many 1's as 0's (in any order): `my $re; $re = qr{ (?: 0 # 0 (??{$re}) # Balanced 1's and 0's 1 # 1 \| 1 # 1 (??{$re}) # Balanced 1's and 0's 0 # 0 )* }x; /^$re\z/;` [download] A random note. For amusement I tried to optimize this, but that failed. I believe the following should be fast, but it isn't, and that's probably a bug (in my understanding, or in the RE engine): `my $re; $re = qr{ (?: 0 # 0 (?> # If I get here, do not backtrack! (??{$re}) # Balanced 1's and 0's ) 1 # 1 \| 1 # 1 (?> # If I get here, do not backtrack! (??{$re}) # Balanced 1's and 0's ) 0 # 0 )* }x; for (qw(1001 11100 000111 00111), "101"x50) { if (/^$re\z/) { print "$_ matched\n"; } else { print "$_ did not match\n"; } }` [download] Given the performance problems, I would not recommend using this regular expression in real life. UPDATE: It was a bug in my understanding. The following works and is fast: `my $re; $re = qr{ (?: 0 # 0 (??{$re}) # Balanced 1's and 0's 1 # 1 \| 1 # 1 (?> # If I get here, do not backtrack! (??{$re}) # Balanced 1's and 0's ) 0 # 0 )? }x; for (qw(1001 11100 000111 00111), "101"x50) { if (/^$re\z/) { print "$_ matched\n"; } else { print "$_ did not match\n"; } }` [download] And so is this: `my $re; $re = qr{ (?> # Avoid backtracking 0 # 0 (??{$re}) # Balanced 1's and 0's 1 # 1 \| 1 # 1 (??{$re}) # Balanced 1's and 0's 0 # 0 )? }x; for (qw(1001 11100 000111 00111), "101"x50) { if (/^$re\z/) { print "$_ matched\n"; } else { print "$_ did not match\n"; } }` [download] The key is that you need to avoid having to backtrack on success.	[reply] [d/l] [select]
Re^2: A (non) reg-ex question by johngg (Canon) on Mar 20, 2006 at 16:25 UTC
I think you may have missed the last "/" in your global substitute. Cheers, JohnGG	[reply]
Re^2: A (non) reg-ex question by johngg (Canon) on Mar 20, 2006 at 21:24 UTC
A supplementary reply, this time in the form of a question. I am trying to understand your third method that uses the recursive regular expression. What I think it is doing is finding balanced "01" pairs in the same way as the Camel book example finds balanced "()" pairs. What I want to know is, how does the regular expression know when to stop recursing? Is the "?" quantifier something to do with it? It seems to me that the regular expression is consuming characters from both ends of the string at the same time. Am I missing something obvious? Cheers, JohnGG	[reply]
Re^3: A (non) reg-ex question by hv (Prior) on Mar 20, 2006 at 23:51 UTC
I think my edition of the Camel predates qr{} so I doubt I have that example to hand, but I'd expect it to be much the same except that balanced parens can usually look like `"()(())(()())"` and the like. Except for certain special optimisations, a regular expression is always traversed from left to right. The critical thing to stop infinite recursion is to ensure that at least one character must be matched each time before recursing, ie that there be something to the left of the nested `$re`. In this case, (`$re = qr{(?:0(??{$re})1)?}`), we require a '0' to be matched before recursing. So `"0011" =~ /^$re\z/` will be executed like: `anchor to start try $re match '0' try $re match '0' try $re fail to find '0' group is optional, matched zero times $re matched '' match '1' group is optional, matched one time $re matched '01' match '1' group is optional, matched one time $re matched '0011' anchor to end match complete: matched '0011' from start` [download] Similarly, you could (foolishly) replace `/0+/` with a recursive regexp like: `$re = qr{0(??{$re})?};` [download] which would work, but the following would fall into infinite recursion: `$re = qr{(??{$re})?0};` [download] Hope this helps, Hugo	[reply] [d/l] [select]
Re^4: A (non) reg-ex question by johngg (Canon) on Mar 21, 2006 at 00:11 UTC
Re: A (non) reg-ex question by salva (Canon) on Mar 20, 2006 at 13:03 UTC
How could we match the language 0{n}1{n}? If you already know that $_ contains '0' and '1' chars only: `tr/0// == tr/1// and index $_, '10' < 0` [download] else: `my $hl = (1+length) >> 1; tr/0// == $hl and tr/1// == $hl and index $_, '10' < 0` [download] or just `/^(0)(1)$/ and length $1 == length $2` [download] what if I only wanted the number of 0s and the number of 1s to be equal? `tr/0// == tr/1//` [download]	[reply] [d/l] [select]
Re: A (non) reg-ex question by mickeyn (Priest) on Mar 20, 2006 at 14:10 UTC
how about: `$str =~ /(0+)(1+)/; $is0n1n = length($1)<=>length($2); # 0 if match` [download] Enjoy, Mickey	[reply] [d/l]
Re: A (non) reg-ex question by holli (Abbot) on Mar 20, 2006 at 13:14 UTC
but it's quiet ugly, do you have any alternatives? Regexes are ugly by design ;-) holli, /regexed monk/	[reply]
Re: A (non) reg-ex question by timos (Beadle) on Mar 20, 2006 at 12:59 UTC
Maybe: `my @foo = split /01/"00001111"; $foo[0]=~s/0/1/; if ($foo[0] eq $foo[1]) {print "String is OK\n";}` [download]	[reply] [d/l]
Re^2: A (non) reg-ex question by timos (Beadle) on Mar 20, 2006 at 13:03 UTC
Mm, I forgot to test if `"00001111"=~/^[0]+[1]+$/;` [download]	[reply] [d/l]
Re: A (non) reg-ex question by exussum0 (Vicar) on Mar 20, 2006 at 22:33 UTC
You can't. Yes you can. You got the implemented answer. Just a blurb on why you can't, but perl breaks the rule. The long, "you can't," answer: A regexp, an achedemic regexp, won't work. I point that out since your example is a perl regular expression. I suggest looking up the pumping lemma and the basic tenants of a regular language. In short, what you need is something that requires stack memory. A real regular expression only needs memory for the input, the expression itself, and a pointer to say where weare at in the regexp. If the number of 0's and 1's have some sort of arithmetic relation, more likely than not, you can't. There are exceptions. The long, "you can," answer: perl's regexp's are an extension of the regular expressions used for regular languages. The fact you have the /e flag totally completely blows it out of the water since oyu can freely allocate variables. Perl's regex's are a wonderful thing. Just stating the fact you can do context sensitive stuff in it.	[reply]
Re^2: A (non) reg-ex question by bart (Canon) on Mar 21, 2006 at 11:03 UTC
The /e flag does not work on a regex. Instead, it's a flag that says the right hand side of a `s///`, which is a string, needs to be treated as code. The regex is the left hand side in `s///` (or the whole thing in `//`). Surely you must have been thinking of `(?{CODE})` and friends? (See perlre)	[reply] [d/l] [select]
Re^3: A (non) reg-ex question by exussum0 (Vicar) on Mar 21, 2006 at 16:06 UTC
Yeah, thanks. Whoops.	[reply]
Re: A (non) reg-ex question by codeacrobat (Chaplain) on Mar 20, 2006 at 20:54 UTC
Another way would be to use Parse::RecDescent. `#!/usr/bin/perl -w use strict; use Parse::RecDescent; my $parser = new Parse::RecDescent (q{ L: '0' L '1' L: '' }); my $string1 = "00111"; my $string2 = "0011"; print $parser->L(\$string1) && $string1 eq '' ? "" : "not ", "in L\n"; + # not in L print $parser->L(\$string2) && $string2 eq '' ? "" : "not ", "in L\n"; + # in L` [download] I am a Parse::RecDescent newbie. Does someone know to write the evaluation in a better way?	[reply] [d/l]
Re: A (non) reg-ex question by Anonymous Monk on Mar 20, 2006 at 16:00 UTC
Assuming the language consists of 1s and 0s... `sub belongs_to_L { my ($s) = @_; my ($num_zeros,$num_ones,$junk); ($num_zeros,$num_ones,$junk) = ( $s =~ m/^(0)(1)(.*)/); return 1 if ( length($zeros) == length($ones) and length($junk) == 0); return 0; }` [download]	[reply] [d/l]


Clear questions and runnable code get the best and fastest answer
	PerlMonks