Beefy Boxes and Bandwidth Generously Provided by pair Networks
XP is just a number
 
PerlMonks  

Re^2: need to parse firts part of SQL-query (regex question)

by Not_a_Number (Prior)
on Jan 17, 2008 at 17:05 UTC ( #662920=note: print w/replies, xml ) Need Help??


in reply to Re: need to parse firts part of SQL-query (regex question)
in thread need to parse firts part of SQL-query (regex question)

Do you mean SQL::Tokenizer? That doesn't seem to do what the OP wants:

use SQL::Tokenizer; my $query = q{f1,f2, SUM(f3),CONCAT(f4,f5, f6), f7}; my @tokens = SQL::Tokenizer->tokenize($query); print join "\n", @tokens; __END__ f1 , f2 , SUM ( f3 ) , CONCAT ( f4 , f5 , f6 ) , f7

Replies are listed 'Best First'.
Re^3: need to parse firts part of SQL-query (regex question)
by grinder (Bishop) on Jan 17, 2008 at 17:22 UTC
    That doesn't seem to do what the OP wants

    Come now young man, where's your sense of adventure? With a bit of lookahead and a state machine you can easily massage the token stream into something useful:

    use SQL::Tokenizer; my $query = q{f1,f2, SUM(f3),CONCAT(f4,f5, f6), sum((f1+f2)*f3)}; my @token = SQL::Tokenizer->tokenize($query); my $paren_depth = 0; my $cache = ''; while(my $val = shift @token) { if ($token[0] eq '(') { $paren_depth++; } if ($val eq ')') { $paren_depth--; if ($paren_depth == 0) { print $cache; $cache = ''; } } if ($paren_depth) { $cache .= $val; } else { print "$val\n"; } } __PRODUCES__ f1 , f2 , SUM(f3) , CONCAT(f4,f5, f6) , sum((f1+f2)*f3)

    That's not too shabby. The tokenizer does the heavy lifting, you just have to put the pieces back together again.

    • another intruder with the mooring in the heart of the Perl

      grinder++ !

      It's rather like a solution I came up with, albeit without using the module in question:

      my $str = 'f1,f2, SUM(f3),CONCAT(f4,f5, f6), f7'; my ( $tok, @toks, $parens ); while ( $str ) { my $char = substr $str, 0, 1, ''; $char eq ' ' and next; $char eq '(' and $parens++; $char eq ')' and $parens--; $char eq ',' && ! $parens and push( @toks, $tok ), $tok = '', next; $tok .= $char; push @toks, $tok if ! $str; } print join ' -- ', @toks;

      ...but I didn't want to post it for fear that it wouldn't be very robust. :)

        fear that it wouldn't be very robust

        Yes indeed. Real-world SQL is likely to have named the columns, so you have to deal with concat(f4, f5, f6) as "foobarquux (combined)". Before you know it, you have reimplemented SQL::Tokenizer!

        • another intruder with the mooring in the heart of the Perl

Re^3: need to parse firts part of SQL-query (regex question)
by mpeppler (Vicar) on Jan 18, 2008 at 08:11 UTC
    Agreed - but this seems to me to be a good starting place.

    I used this to parse SQL source files for over a thousand stored procedures and check that the case of variables, columns, etc. matched when moving a system from a case-insensitive dataserver to a case-sensitive one. It required doing quite a bit of hand-coding to handle the various language elements of Transact-SQL, and ended up with about 700 lines of code to do all the checks, but I mostly got it done...

    Michael

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://662920]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others drinking their drinks and smoking their pipes about the Monastery: (4)
As of 2020-11-29 23:22 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found

    Notices?