Beefy Boxes and Bandwidth Generously Provided by pair Networks
Just another Perl shrine
 
PerlMonks  

Intercepting critical SQL inside DBI.pm

by LanX (Saint)
on Sep 15, 2022 at 15:59 UTC ( [id://11146896]=perlquestion: print w/replies, xml ) Need Help??

LanX has asked for the wisdom of the Perl Monks concerning the following question:

Hi

The "on-topic" Perl question

I'm looking for a clean ways to intercept SQLs prepared/executed by DBI.pm and log the caller trace of certain expressions (uses of subqueries with ORDER-BY clauses).

The idea is to be alarmed about critical code dynamically.

I'm thinking of using an regex to find all queries where SELECT is not at the start and ORDER or GROUP are used and write the caller chain to a log-file.

SQL::Parser could be used for an in detail introspection then.

Before reinventing the wheel I'd like to ask for your opinion and ready to use solutions to monitor "bad" SQL.°

readmore for the SQL background problem:

Short background

My client is confronted with some inconsistencies in MariaDB 10.6 which boils down to the fact that the SQL standard doesn't allow a subqueries to have an ORDER BY clause.

MariaDB allows ORDER-BY but (often) ignores them as long they are not LIMITed. No error is thrown if the LIMIT is missing.

A large part of our code-base was written with MariaDB 5.X when this wasn't an issue.

The longer read ...

One of my products uses an emulation of ROW_NUMBER() written similar to this one in a subqueries inside an INSERT or REPLACE statement. After updating we noticed that this randomly started to fail, but I was able to fix that specific code with teh actual ROW_NUMBER() which is available with other "window functions" since MariaDB 10.2

But we are concerned that other subqueries are effected and nailed it down to this bug-report

  • https://mariadb.com/kb/en/why-is-order-by-in-a-from-subquery-ignored/

    A "table" (and subquery in the FROM clause too) is - according to the SQL standard - an unordered set of rows. Rows in a table (or in a subquery in the FROM clause) do not come in any specific order. That's why the optimizer can ignore the ORDER BY clause that you have specified. In fact, the SQL standard does not even allow the ORDER BY clause to appear in this subquery (we allow it, because ORDER BY ... LIMIT ... changes the result, the set of rows, not only their order).

    You need to treat the subquery in the FROM clause, as a set of rows in some unspecified and undefined order, and put the ORDER BY on the top-level SELECT.

It beats me why Maria-DB is not throwing an error if the LIMIT is missing in an "ordered" subquery. ²

I also identified a global setting for concurrent inserts which defaults to AUTO and might cause INSERT-SELECTs to ignore order in subqueries.

This is in contrast to MySQL REFERENCE documentation which recommends explicitly using ORDER with INSERT-SELECTs to guaranty some replication scenarios to work. oO

  • https://dev.mysql.com/doc/refman/8.0/en/insert-select.html

    The order in which a SELECT or TABLE statement with no ORDER BY clause returns rows is nondeterministic. This means that, when using replication, there is no guarantee that such a SELECT returns rows in the same order on the source and the replica, which can lead to inconsistencies between them. To prevent this from occurring, always write INSERT ... SELECT or INSERT ... TABLE statements that are to be replicated using an ORDER BY clause that produces the same row order on the source and the replica. See also Section 17.5.1.18, “Replication and LIMIT”.

It's hard to come by a clean SQL-documentation or even SQL standards, but this quote seems to indicate that a query inside INSERT-SELECT is NOT considered a subquery and hence should be exempt from the order exception above(?)³

I'm planning to discuss this on an SQL board on Reddit or Stack-Overflow, just didn't want to keep you in the dark about this mess.

updates

°) found this on SO how-to-intercept-queries-in-a-subclassed-dbi

²) compare how MS-SQL is handling that: ORDER BY can only be specified when TOP is also specified.

³) or is this a fuzzy way to say to always combine a LIMIT with ORDER-BY to allow replication to stay reliable?

Cheers Rolf
(addicted to the Perl Programming Language :)
Wikisyntax for the Monastery

Replies are listed 'Best First'.
Re: Intercepting critical SQL inside DBI.pm
by kikuchiyo (Hermit) on Sep 16, 2022 at 09:15 UTC

    If the solution outlined in the SO answer doesn't work out, here is an out-of-the-box idea: write a small module in C that wraps the low level MySQL/MariaDB library routine that actually sends the query to the database, and use LD_PRELOAD to enable/force it.

    This is obviously an incomplete solution, because you won't have any information about the Perl call stack that produced the query, but if you have adequate logging in the Perl side of the application, you might be able to tie the query to a location in the code based on timestamps.

    On the other hand, with this method you don't have do modify any application or library code on your production servers, you just have to put the .so on the server and modify the starter script/environment. Furthermore, you are guaranteed to catch the query just before it is sent to the DB server.

      I'm reluctant to dabble with C, and this wouldn't survive many maintenance cycles.

      And I really need the callers dumped, correlating timestamps is not really an option.

      3 more options come to mind

      1. DBI's trace documents a way to tie the filehandle and intercept messages. Probably I can also access the original callers.
      2. perlrun and perldebug list several of Perl's trace options, I know from previous experiments that arguments are listed too in the stacktrace.
      3. writing a proxy object for DBI using AUTOLOAD may be more generic than sub-classing. I could dynamically decide what's intercepted, and if another object is returned I'd wrap it into just another proxy. I 'm sure hope a search on CPAN will show previous attempts to do this.

      Cheers Rolf
      (addicted to the Perl Programming Language :)
      Wikisyntax for the Monastery

        Writing a proxy object for DBI is not enough as you have to write proxies for the database handle and database statement as well.

        A very simple approach is to enable the tracing in DBI, by calling DBI->trace('2|SQL|2', 'c:/Temp/DBI-Trace.txt') or setting $ENV{DBI_TRACE}='2|SQL|2'. This still has the drawback of writing to the filehandle. YOu could then use a tied filehandle maybe to analyze the lines written to it and use caller() repeatedly to identify the place where the statement was created from (by looking for the first non-DBD, non-DBI package in the calltree).

        The DBI documentation claims that this is only implemented by some DBDs, but maybe MySQL/MariaDB have that feature...

Re: Intercepting critical SQL inside DBI.pm
by LanX (Saint) on Feb 22, 2023 at 14:36 UTC
    For the record:

    I just stumbled over the documentation for callbacks in DBI

    https://metacpan.org/pod/DBI#Callbacks

    It's a powerful way to inspect method calls on all handles.

    It even provides a way to define callbacks an derived handles such that a database-handle $dbh defines callbacks for all children statement-handles $sth

    DBI is impressively feature rich, no need to hack the modules (once you managed to read all the documentation ;-)

    Cheers Rolf
    (addicted to the 𐍀𐌴𐍂𐌻 Programming Language :)
    Wikisyntax for the Monastery

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://11146896]
Approved by choroba
Front-paged by Discipulus
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others rifling through the Monastery: (1)
As of 2024-04-25 03:44 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found