Beefy Boxes and Bandwidth Generously Provided by pair Networks
There's more than one way to do things
 
PerlMonks  

Re: Regexp generating strings?

by CombatSquirrel (Hermit)
on Aug 18, 2003 at 16:17 UTC ( [id://284626]=note: print w/replies, xml ) Need Help??


in reply to Regexp generating strings?

I liked the problem and so I tried to come up with a solution to it. Because I didn't like the idea of having to re-invent the Perl RegEx parser completely, there are a number of limitations to my program:
Only the following are allowed:
  • literal characters
  • capturing parens
  • OR (|)
  • the following quantifiers in greedy forms:
    • ?
    • {x,y} only with x and y specified
This means especially that the following are not allowed:
  • character classes, including ISO ones
  • escaped characters
  • star and plus
  • the almighty dot
  • lookahead and lookbehind
  • and many others...
And the code is, of course, not optimized, and I am not copletely sure whether it is completely bug-free. Any comments are welcome and I would also be highly interested in a Perlgolf version of this one :-). Well, here it is:
#!perl use strict; use warnings; sub ParensMatch { my @string = split //, shift; my $num = 0; for (@string) { if ($_ eq '(') { ++$num } elsif ($_ eq ')') { --$num; return undef if ($num < 0) } } ($num == 0) ? return 'match' : return undef; } sub OrOutsideParens { my @segments = split /\|/, shift; return undef if (@segments == 1); my $tot = ''; my $num = @segments; for (@segments) { return undef if (--$num == 0); $tot .= $_; return 'Yes' if (ParensMatch($tot)); } } sub Combinations { return $_[0] if (@_ == 1); return '' if (@_ < 1); @_ = map { ($_ ne '' and $_ =~ /\|/) ? [split(/\|/, $_)] : [$_] } @ +_; while (@_ > 1) { my $second = pop; my $first = pop; my $tot = []; for my $fval (@$first) { for (@$second) { push @$tot, $fval ? $_ ? "$fval$_" : $fval : $_ ? $_ : ''; } } push @_, $tot; } return join('|', @{$_[0]}); } sub ParseRegex { my $regex = shift; if (defined $regex and $regex =~ /\|/ and OrOutsideParens($regex)) +{ my @snippets = split /\|/, $regex; my $cur = ''; my @regsnipp; for (@snippets) { $cur .= '|' unless ($cur eq ''); $cur .= $_; if (ParensMatch($cur)) { push @regsnipp, ParseRegex($cur); $cur = ''; } } die 'Unmatched | in RegEx' if ($cur ne ''); $regex = join '|', @regsnipp; } elsif (defined $regex and $regex =~ /\((.*)\)((\{(\d+),(\d+)\})|\ +?)?/) { my ($before, $after, $first, $second, $third, $fourth, $fifth) = + ($`, $', $1, $2, $3, $4, $5); my $parsedRegex = ParseRegex($first); if ($second) { if ($third) { $regex = Combinations(($parsedRegex) x $fourth); $parsedRegex = join '|', map { Combinations($regex, ($pars +edRegex) x $_) } (0..$fifth - $fourth); } else { $parsedRegex = "|$parsedRegex"; } } $regex = Combinations(ParseRegex($before), $parsedRegex, ParseRe +gex($after)); } elsif (defined $regex and $regex =~ /\{(\d+),(\d+)\}/) { my ($before, $after) = ($`, $'); ($before, $after) = (ParseRegex($before), ParseRegex($after)); my $parsedMinimum = Combinations(($before) x $4); $regex = Combinations(join '|', map { Combinations($parsedMinimu +m, ($before) x $_) } (0..$2 - $1), $after); } elsif (defined $regex and $regex =~ /(.)\?/) { my ($before, $after, $first) = ($`, $', $1); ($before, $after) = (ParseRegex($before), ParseRegex($after)); $regex = Combinations($before, $after) . '|' . Combinations($bef +ore, $first, $after); } return $regex; } sub getRegexStrings { my $regex = shift; my %seen = map { $_ => 1 } split(/\|/, ParseRegex($regex)); return join($/, sort keys %seen); } for my $regex (<DATA>) { chomp $regex; if (ParensMatch($regex)) { print "$/RegEx: <$regex>$/"; print "Matching strings:$/$/" . getRegexStrings($regex) . $/ x 2 +; } else { print "$/Mismatched parens in RegEx <$regex>$/$/"; } } __DATA__ ab?(c|d){0,3} abac? (a|(b(c|d)))d?

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://284626]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others drinking their drinks and smoking their pipes about the Monastery: (3)
As of 2024-04-24 02:32 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found