Re: A (non) reg-ex question
by hv (Prior) on Mar 20, 2006 at 14:16 UTC
|
!/[^01]/ && !/10/ && ($_ ^ reverse $_) !~ /[^\001]/
That is: a) check that there are no characters that are not 0 or 1, b) check that no 1s precede 0s, c) XOR the string with its reverse - it should give chr(1) for every character.
Another approach, this time destructive: 1 while s/(?<!1)01(?!0)//g;
length($_) == 0;
That is: delete "01" from the centre as often as you can, and verify that this deletes all characters from the string. The lookarounds guard against deleting from 101 or 010.
And another, using a single recursive regexp: our $re;
$re = qr{(?:0(??{$re})1)?};
/^$re\z/;
That is: MATCH := "" | "0" MATCH "1".
Update: added missing '/' in s///, per johngg.
Hugo | [reply] [d/l] [select] |
|
For amusement, here is the recursive regexp modified to match the case where there are as many 1's as 0's (in any order):
my $re;
$re = qr{
(?:
0 # 0
(??{$re}) # Balanced 1's and 0's
1 # 1
|
1 # 1
(??{$re}) # Balanced 1's and 0's
0 # 0
)*
}x;
/^$re\z/;
A random note. For amusement I tried to optimize this, but that failed. I believe the following should be fast, but it isn't, and that's probably a bug (in my understanding, or in the RE engine):
my $re;
$re = qr{
(?:
0 # 0
(?> # If I get here, do not backtrack!
(??{$re}) # Balanced 1's and 0's
)
1 # 1
|
1 # 1
(?> # If I get here, do not backtrack!
(??{$re}) # Balanced 1's and 0's
)
0 # 0
)*
}x;
for (qw(1001 11100 000111 00111), "101"x50) {
if (/^$re\z/) {
print "$_ matched\n";
}
else {
print "$_ did not match\n";
}
}
Given the performance problems, I would not recommend using this regular expression in real life.
UPDATE: It was a bug in my understanding. The following works and is fast:
my $re;
$re = qr{
(?:
0 # 0
(??{$re}) # Balanced 1's and 0's
1 # 1
|
1 # 1
(?> # If I get here, do not backtrack!
(??{$re}) # Balanced 1's and 0's
)
0 # 0
)*?
}x;
for (qw(1001 11100 000111 00111), "101"x50) {
if (/^$re\z/) {
print "$_ matched\n";
}
else {
print "$_ did not match\n";
}
}
And so is this:
my $re;
$re = qr{
(?> # Avoid backtracking
0 # 0
(??{$re}) # Balanced 1's and 0's
1 # 1
|
1 # 1
(??{$re}) # Balanced 1's and 0's
0 # 0
)*?
}x;
for (qw(1001 11100 000111 00111), "101"x50) {
if (/^$re\z/) {
print "$_ matched\n";
}
else {
print "$_ did not match\n";
}
}
The key is that you need to avoid having to backtrack on success. | [reply] [d/l] [select] |
|
I think you may have missed the last "/" in your global substitute.Cheers, JohnGG
| [reply] |
|
A supplementary reply, this time in the form of a question.I am trying to understand your third method that uses the recursive regular expression. What I think it is doing is finding balanced "01" pairs in the same way as the Camel book example finds balanced "()" pairs. What I want to know is, how does the regular expression know when to stop recursing? Is the "?" quantifier something to do with it? It seems to me that the regular expression is consuming characters from both ends of the string at the same time. Am I missing something obvious? Cheers, JohnGG
| [reply] |
|
I think my edition of the Camel predates qr{} so I doubt I have that example to hand, but I'd expect it to be much the same except that balanced parens can usually look like "()(())(()())" and the like.
Except for certain special optimisations, a regular expression is always traversed from left to right. The critical thing to stop infinite recursion is to ensure that at least one character must be matched each time before recursing, ie that there be something to the left of the nested $re.
In this case, ($re = qr{(?:0(??{$re})1)?}), we require a '0' to be matched before recursing. So "0011" =~ /^$re\z/ will be executed like:
anchor to start
try $re
match '0'
try $re
match '0'
try $re
fail to find '0'
group is optional, matched zero times
$re matched ''
match '1'
group is optional, matched one time
$re matched '01'
match '1'
group is optional, matched one time
$re matched '0011'
anchor to end
match complete: matched '0011' from start
Similarly, you could (foolishly) replace /0+/ with a recursive regexp like: $re = qr{0(??{$re})?};
which would work, but the following would fall into infinite recursion: $re = qr{(??{$re})?0};
Hope this helps,
Hugo | [reply] [d/l] [select] |
|
Re: A (non) reg-ex question
by salva (Canon) on Mar 20, 2006 at 13:03 UTC
|
tr/0// == tr/1// and index $_, '10' < 0
else:
my $hl = (1+length) >> 1;
tr/0// == $hl and tr/1// == $hl and index $_, '10' < 0
or just
/^(0*)(1*)$/ and length $1 == length $2
what if I only wanted the number of 0s and the number of 1s to be equal?
tr/0// == tr/1//
| [reply] [d/l] [select] |
Re: A (non) reg-ex question
by mickeyn (Priest) on Mar 20, 2006 at 14:10 UTC
|
$str =~ /(0+)(1+)/;
$is0n1n = length($1)<=>length($2); # 0 if match
Enjoy,
Mickey
| [reply] [d/l] |
Re: A (non) reg-ex question
by holli (Abbot) on Mar 20, 2006 at 13:14 UTC
|
but it's quiet ugly, do you have any alternatives?
Regexes are ugly by design ;-)
| [reply] |
Re: A (non) reg-ex question
by timos (Beadle) on Mar 20, 2006 at 12:59 UTC
|
my @foo = split /01/"00001111";
$foo[0]=~s/0/1/;
if ($foo[0] eq $foo[1]) {print "String is OK\n";}
| [reply] [d/l] |
|
"00001111"=~/^[0]+[1]+$/;
| [reply] [d/l] |
Re: A (non) reg-ex question
by exussum0 (Vicar) on Mar 20, 2006 at 22:33 UTC
|
You can't. Yes you can. You got the implemented answer. Just a blurb on why you can't, but perl breaks the rule.
The long, "you can't," answer: A regexp, an achedemic regexp, won't work. I point that out since your example is a perl regular expression. I suggest looking up the pumping lemma and the basic tenants of a regular language. In short, what you need is something that requires stack memory. A real regular expression only needs memory for the input, the expression itself, and a pointer to say where weare at in the regexp. If the number of 0's and 1's have some sort of arithmetic relation, more likely than not, you can't. There are exceptions.
The long, "you can," answer: perl's regexp's are an extension of the regular expressions used for regular languages. The fact you have the /e flag totally completely blows it out of the water since oyu can freely allocate variables. Perl's regex's are a wonderful thing. Just stating the fact you can do context sensitive stuff in it.
| [reply] |
|
| [reply] [d/l] [select] |
|
| [reply] |
Re: A (non) reg-ex question
by codeacrobat (Chaplain) on Mar 20, 2006 at 20:54 UTC
|
Another way would be to use Parse::RecDescent.
#!/usr/bin/perl -w
use strict;
use Parse::RecDescent;
my $parser = new Parse::RecDescent (q{
L: '0' L '1'
L: ''
});
my $string1 = "00111";
my $string2 = "0011";
print $parser->L(\$string1) && $string1 eq '' ? "" : "not ", "in L\n";
+ # not in L
print $parser->L(\$string2) && $string2 eq '' ? "" : "not ", "in L\n";
+ # in L
I am a Parse::RecDescent newbie. Does someone know to write the evaluation in a better way? | [reply] [d/l] |
Re: A (non) reg-ex question
by Anonymous Monk on Mar 20, 2006 at 16:00 UTC
|
Assuming the language consists of 1s and 0s...
sub belongs_to_L {
my ($s) = @_;
my ($num_zeros,$num_ones,$junk);
($num_zeros,$num_ones,$junk) = ( $s =~ m/^(0*)(1*)(.*)/);
return 1 if ( length($zeros) == length($ones) and
length($junk) == 0);
return 0;
}
| [reply] [d/l] |