Beefy Boxes and Bandwidth Generously Provided by pair Networks
Don't ask to ask, just ask
 
PerlMonks  

Re^3: In search of an efficient query abstractor

by gone2015 (Deacon)
on Dec 07, 2008 at 20:44 UTC ( [id://728783]=note: print w/replies, xml ) Need Help??


in reply to Re^2: In search of an efficient query abstractor
in thread In search of an efficient query abstractor

Can you identify token separators, and break the input up into stuff which isn't a problem, and stuff which might be ?

Starting by tidying up:

$query =~ s/\s+/ /g ; # that's the whitespace $query =~ s/\A\s// ; # strip leading $query =~ s/\s\Z// ; # strip trailing $query = lc($query) ; # all lower case $query =~ s/(["'])((?:\\\1|\1\1|.)*?)\1/mash_s($1, $2)/eg ; # Eliminate separators from quoted string +s sub mash_s { my ($q, $s) = @_ ; $s =~ tr/0-9a-z/\\/c ; return $q.$s.$q ; } ;
which, in particular, leaves all "..." or '...' strings containing only [0-9a-z\\]. Means that can then attack anything between separator characters:
$query =~ s/([^ !#\$%()*,\/:;<=>?\@[\]^{|}~]+)/mash_l($1)/eg ; sub mash_l { my ($s) = @_ ; return $s if $s =~ /^(?:[a-z]+|\+|\-)$/ ; return 'N' if $s =~ /^[+-]?(?: (?:\d+(?:\.\d*)? | \.\d+) (?:e[+-]\d+ +)? |(?:0(?: x[0-9a-f]+ |b[01]+ ) ) |x'[0-9a-f]+' |b'[01]+' )$/x ; return 'S' if $s =~ /^(["']).*?\1$/ ; return $s ; } ;
Sadly, what this shows most clearly is that distinguishing unary and binary '+' and '-' is tricky. The above will cope with 12 + -17 and 12*-5, but will fail on 12+13 or 12 +-13 and so on...

...using a parser, where somebody else has done all the hard work, looks like a good trick !

Replies are listed 'Best First'.
Re^4: In search of an efficient query abstractor
by xaprb (Scribe) on Dec 07, 2008 at 21:07 UTC

    About unary/binary: I had the same thought while sketching out a state machine. Obviously you have to keep some context to know which is which. I'm thinking that brute-forcing and just treating such an expression as a number is acceptable for this log analysis. I mean,

    select 5 + 1; select 6; select 8 + 1+-5;

    From the point of view of log analysis, those statements are all similar. Selecting a number is selecting a number, mush them all together and report on them in aggregate.

    Of course that's not strictly true. You might have a silly application that constantly does "select 5" and not so frequently does "select 5 + 5" and you want to be able to distinguish them so you can find the offending code that's causing the first query. But that's a corner case.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://728783]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others goofing around in the Monastery: (5)
As of 2024-04-19 13:25 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found