http://qs321.pair.com?node_id=662894


in reply to need to parse firts part of SQL-query (regex question)

To expand on roboticus's comments - there is a SQL::Tokenize (I think that's the name) module on CPAN which works pretty well at tokenizing a SQL query. You could use that as a starting point.

Michael

  • Comment on Re: need to parse firts part of SQL-query (regex question)

Replies are listed 'Best First'.
Re^2: need to parse firts part of SQL-query (regex question)
by Not_a_Number (Prior) on Jan 17, 2008 at 17:05 UTC

    Do you mean SQL::Tokenizer? That doesn't seem to do what the OP wants:

    use SQL::Tokenizer; my $query = q{f1,f2, SUM(f3),CONCAT(f4,f5, f6), f7}; my @tokens = SQL::Tokenizer->tokenize($query); print join "\n", @tokens; __END__ f1 , f2 , SUM ( f3 ) , CONCAT ( f4 , f5 , f6 ) , f7
      That doesn't seem to do what the OP wants

      Come now young man, where's your sense of adventure? With a bit of lookahead and a state machine you can easily massage the token stream into something useful:

      use SQL::Tokenizer; my $query = q{f1,f2, SUM(f3),CONCAT(f4,f5, f6), sum((f1+f2)*f3)}; my @token = SQL::Tokenizer->tokenize($query); my $paren_depth = 0; my $cache = ''; while(my $val = shift @token) { if ($token[0] eq '(') { $paren_depth++; } if ($val eq ')') { $paren_depth--; if ($paren_depth == 0) { print $cache; $cache = ''; } } if ($paren_depth) { $cache .= $val; } else { print "$val\n"; } } __PRODUCES__ f1 , f2 , SUM(f3) , CONCAT(f4,f5, f6) , sum((f1+f2)*f3)

      That's not too shabby. The tokenizer does the heavy lifting, you just have to put the pieces back together again.

      • another intruder with the mooring in the heart of the Perl

        grinder++ !

        It's rather like a solution I came up with, albeit without using the module in question:

        my $str = 'f1,f2, SUM(f3),CONCAT(f4,f5, f6), f7'; my ( $tok, @toks, $parens ); while ( $str ) { my $char = substr $str, 0, 1, ''; $char eq ' ' and next; $char eq '(' and $parens++; $char eq ')' and $parens--; $char eq ',' && ! $parens and push( @toks, $tok ), $tok = '', next; $tok .= $char; push @toks, $tok if ! $str; } print join ' -- ', @toks;

        ...but I didn't want to post it for fear that it wouldn't be very robust. :)

      Agreed - but this seems to me to be a good starting place.

      I used this to parse SQL source files for over a thousand stored procedures and check that the case of variables, columns, etc. matched when moving a system from a case-insensitive dataserver to a case-sensitive one. It required doing quite a bit of hand-coding to handle the various language elements of Transact-SQL, and ended up with about 700 lines of code to do all the checks, but I mostly got it done...

      Michael

Re^2: need to parse firts part of SQL-query (regex question)
by roboticus (Chancellor) on Jan 17, 2008 at 16:32 UTC