Re: need to parse firts part of SQL-query (regex question)

http://qs321.pair.com?node_id=662894

in reply to need to parse firts part of SQL-query (regex question)

To expand on roboticus's comments - there is a SQL::Tokenize (I think that's the name) module on CPAN which works pretty well at tokenizing a SQL query. You could use that as a starting point.

Michael

Comment on Re: need to parse firts part of SQL-query (regex question)

Replies are listed 'Best First'.
Re^2: need to parse firts part of SQL-query (regex question) by Not_a_Number (Prior) on Jan 17, 2008 at 17:05 UTC
Do you mean SQL::Tokenizer? That doesn't seem to do what the OP wants: `use SQL::Tokenizer; my $query = q{f1,f2, SUM(f3),CONCAT(f4,f5, f6), f7}; my @tokens = SQL::Tokenizer->tokenize($query); print join "\n", @tokens; __END__ f1 , f2 , SUM ( f3 ) , CONCAT ( f4 , f5 , f6 ) , f7` [download]	[reply] [d/l]
Re^3: need to parse firts part of SQL-query (regex question) by grinder (Bishop) on Jan 17, 2008 at 17:22 UTC
That doesn't seem to do what the OP wants Come now young man, where's your sense of adventure? With a bit of lookahead and a state machine you can easily massage the token stream into something useful: `use SQL::Tokenizer; my $query = q{f1,f2, SUM(f3),CONCAT(f4,f5, f6), sum((f1+f2)f3)}; my @token = SQL::Tokenizer->tokenize($query); my $paren_depth = 0; my $cache = ''; while(my $val = shift @token) { if ($token[0] eq '(') { $paren_depth++; } if ($val eq ')') { $paren_depth--; if ($paren_depth == 0) { print $cache; $cache = ''; } } if ($paren_depth) { $cache .= $val; } else { print "$val\n"; } } __PRODUCES__ f1 , f2 , SUM(f3) , CONCAT(f4,f5, f6) , sum((f1+f2)f3)` [download] That's not too shabby. The tokenizer does the heavy lifting, you just have to put the pieces back together again. • another intruder with the mooring in the heart of the Perl	[reply] [d/l]
Re^4: need to parse firts part of SQL-query (regex question) by Not_a_Number (Prior) on Jan 17, 2008 at 17:37 UTC
grinder++ ! It's rather like a solution I came up with, albeit without using the module in question: `my $str = 'f1,f2, SUM(f3),CONCAT(f4,f5, f6), f7'; my ( $tok, @toks, $parens ); while ( $str ) { my $char = substr $str, 0, 1, ''; $char eq ' ' and next; $char eq '(' and $parens++; $char eq ')' and $parens--; $char eq ',' && ! $parens and push( @toks, $tok ), $tok = '', next; $tok .= $char; push @toks, $tok if ! $str; } print join ' -- ', @toks;` [download] ...but I didn't want to post it for fear that it wouldn't be very robust. :)	[reply] [d/l]
Re^5: need to parse firts part of SQL-query (regex question) by grinder (Bishop) on Jan 17, 2008 at 18:12 UTC
Re^3: need to parse firts part of SQL-query (regex question) by mpeppler (Vicar) on Jan 18, 2008 at 08:11 UTC
Agreed - but this seems to me to be a good starting place. I used this to parse SQL source files for over a thousand stored procedures and check that the case of variables, columns, etc. matched when moving a system from a case-insensitive dataserver to a case-sensitive one. It required doing quite a bit of hand-coding to handle the various language elements of Transact-SQL, and ended up with about 700 lines of code to do all the checks, but I mostly got it done... Michael	[reply]
Re^2: need to parse firts part of SQL-query (regex question) by roboticus (Chancellor) on Jan 17, 2008 at 16:32 UTC
mpeppler++ That looks like a *much* better suggestion than mine! ...roboticus	[reply]

In Section Seekers of Perl Wisdom