Can you identify token separators, and break the input up into stuff which isn't a problem, and stuff which might be ?
Starting by tidying up:
$query =~ s/\s+/ /g ; # that's the whitespace
$query =~ s/\A\s// ; # strip leading
$query =~ s/\s\Z// ; # strip trailing
$query = lc($query) ; # all lower case
$query =~ s/(["'])((?:\\\1|\1\1|.)*?)\1/mash_s($1, $2)/eg ;
# Eliminate separators from quoted string
+s
sub mash_s {
my ($q, $s) = @_ ;
$s =~ tr/0-9a-z/\\/c ;
return $q.$s.$q ;
} ;
which, in particular, leaves all
"..." or
'...' strings containing only
[0-9a-z\\]. Means that can then attack anything between separator characters:
$query =~ s/([^ !#\$%()*,\/:;<=>?\@[\]^{|}~]+)/mash_l($1)/eg ;
sub mash_l {
my ($s) = @_ ;
return $s if $s =~ /^(?:[a-z]+|\+|\-)$/ ;
return 'N' if $s =~ /^[+-]?(?:
(?:\d+(?:\.\d*)? | \.\d+) (?:e[+-]\d+
+)?
|(?:0(?:
x[0-9a-f]+
|b[01]+
)
)
|x'[0-9a-f]+'
|b'[01]+'
)$/x ;
return 'S' if $s =~ /^(["']).*?\1$/ ;
return $s ;
} ;
Sadly, what this shows most clearly is that distinguishing unary and binary '
+' and '
-' is tricky. The above will cope with
12 + -17 and
12*-5, but will fail on
12+13 or
12 +-13 and so on...
...using a parser, where somebody else has done all the hard work, looks like a good trick !
-
Are you posting in the right place? Check out Where do I post X? to know for sure.
-
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big>
<blockquote> <br /> <dd>
<dl> <dt> <em> <font>
<h1> <h2> <h3> <h4>
<h5> <h6> <hr /> <i>
<li> <nbsp> <ol> <p>
<small> <strike> <strong>
<sub> <sup> <table>
<td> <th> <tr> <tt>
<u> <ul>
-
Snippets of code should be wrapped in
<code> tags not
<pre> tags. In fact, <pre>
tags should generally be avoided. If they must
be used, extreme care should be
taken to ensure that their contents do not
have long lines (<70 chars), in order to prevent
horizontal scrolling (and possible janitor
intervention).
-
Want more info? How to link
or How to display code and escape characters
are good places to start.