Re: What can regular expressions NOT do?⭐
by turnstep (Parson) on Apr 25, 2000 at 23:35 UTC
|
Regexps are bad at very complicated cases, especially when
things are nested. The typical example is HTML: it's easy enough
to make a simple regexp to grab tags, but to get it really right
and account for all cases (comments, greater than signs within quotes
within HTML tags, multiline tags, etc.) you outgrow the usefullness
of a regular expression and really need something that will
actually parse the string (or file, whatever) that contains
the HTML. Luckily, there are modules that will do just that
for you.
| [reply] |
|
Wipe your ass**** after you're done shxxing
| [reply] |
Re: What can regular expressions NOT do?⭐
by buzzcutbuddha (Chaplain) on Apr 25, 2000 at 15:18 UTC
|
The question is not so much 'What Can't Regexp Do?', but when shouldn't you use it.
There are times when processing strings where it is easier and faster to use a substr
to grab what you need from input. This is especially true if you know that the length
of the input elements does not change (ie a fixed-width flat text file...). I guess the
analogy would be that if you wanted to dig postholes for a fence, you could use a shovel,
which is powerful and has many different ways to use, but is not as efficient as a posthole
digger. Don't get me wrong, I love regexps, but sometimes, they are not the best tool for
the job. :) | [reply] |
Re: What can regular expressions NOT do?⭐
by kryten (Scribe) on Apr 25, 2000 at 18:45 UTC
|
It's just like the old adage:
If all you have is a regexp, then everything
looks like a /^[Nn]ail$/
:)
| [reply] |
Re: What can regular expressions NOT do?⭐
by jbn (Initiate) on Apr 27, 2000 at 01:48 UTC
|
you cannot write a regexp that will match any
level of nested balanced parentheses. You can
always write one to match a specific number of
balanced ones, but not one that can match a non-
fixed number of these. I think there is an
explaination in the Friedl book. | [reply] |
|
Let me see if I can make this make sense with digging out a text book...
Regular expressions are implemented internally by a 'finite state automata'.
If you've never heard the term, I'll attempt to explain it...
Picture a group of circles interconnected by various lines. The circles represent
the current state, and the lines represent the next state to go to if a give input
is seen. One circle is the start state, and some other number of circles are 'end'
states (you can have more than one).
For a specific example try this:
take three circles, label them 'start', '1', and 'end'
draw an arrow from 'start' to '1', from '1' to 'end' and from 'end' to '1'
label the arrows 'a','b' and 'c' respectively.
Starting at 'start', take each character of input and follow the link with that
label to the next state.
If at the end of the input you're at the 'end' state, the this automata matches
the input.
If you have a character of input that you don't have a link for from you're current
state, or you run out of input and aren't on a 'end' state, the the automata doesn't match the input.
Using our example the following will match: 'abc','abcbc','abcbcbc'.
And these will not: 'a', 'ab','abcb','abcd','ad', etc.
(The regular expression for this automata would be /^a(bc)+$/)
At each step the only thing the automata is concerned with is what state it is in,
and what is the next character of input. The is no retained knowledge of what the
previous characters were. Since finite state automatas have no 'memory' of what
input they've seen before, they have no way of knowing if the correct number of ')'
has been found.
Hopefully that made sense, but was probably FMTYWTK
/\/\averick
| [reply] |
|
$string = "((()()())"; # one unbalanced paren
($re = "\Q$string") =~ s/\\(\()|\\(\))/$1\\$1$2$2/g;
my @a = eval { $string =~ $re };
die "Mismatched brackets in '$string'\n" if $@;
-J.
| [reply] [d/l] |
Re: What can regular expressions NOT do?⭐
by ton (Friar) on Apr 03, 2001 at 23:23 UTC
|
Also cannot solve the "dangling else" problem. That is, if you're checking a piece of code with a bunch of "if" statements which may or may not be followed by "else" statements, it is impossible to write a regexp that will find all occurances of "elses" without matching "ifs" (e.g. the point at which the number of elses exceeds the number of ifs). Note that this is trivially easy to do with a loop.
There are a bunch more things also; have to dig up my compiler books to list them, though :) | [reply] |
Re: What can regular expressions NOT do?
by Anonymous Monk on May 20, 2002 at 05:16 UTC
|
It is because regular expressions merely have the power of finite state machines. These finite state machines have NO memory.... how could it match equal numbers of an indeterminate quantity without MEMORY. It can't.
That's why you would need the power of a more expressive grammar ... that is that which can be expressed by a pushdown automota. With a pushdown, we have a "stack" of memory that we can save about each state. Clearly, you can see how the matched parenthesis problem becomes very simple with such a mechanism at hand. | [reply] |
Re: What can regular expressions NOT do?
by I0 (Priest) on May 20, 2002 at 20:51 UTC
|
But Perl's m// operator is a more expressive grammar than formal regular expressions.
For example, \1 allows you to detect primes:
print "Prime" if (1 x shift) !~ /^1?$|^(11+?)\1+$/
And s/// can be used to detect balanced parenthesis:
$_ = '(()(()()))';
(my $re=$_)=~s/((\()|(\))|.)/${[')','']}[!$3]\Q$1\E${['(','']}[!$2]/gs
+;
eval{/$re/};
print "Balanced" unless $@ =~ /unmatched/;
| [reply] [d/l] [select] |
Re: What can regular expressions NOT do?
by Anonymous Monk on Oct 21, 2003 at 08:10 UTC
|
| [reply] |