Beefy Boxes and Bandwidth Generously Provided by pair Networks
good chemistry is complicated,
and a little bit messy -LW
 
PerlMonks  

Re^3: Conditional many to many relationships with Class::DBI

by cbraga (Pilgrim)
on Nov 01, 2004 at 14:50 UTC ( [id://404314]=note: print w/replies, xml ) Need Help??


in reply to Re^2: Conditional many to many relationships with Class::DBI
in thread Conditional many to many relationships with Class::DBI

It isn't difficult to keep the versioning in the application level, and I believe that's the rigth thing to do. It isn't a database's job to version your data, only to store and retrieve it.

It's extra work of course, but not much extra work. And you should use a real, transactioned RDMS of course to avoid inconsistencies ;)

ESC[78;89;13p ESC[110;121;13p

  • Comment on Re^3: Conditional many to many relationships with Class::DBI

Replies are listed 'Best First'.
Re^4: Conditional many to many relationships with Class::DBI
by dragonchild (Archbishop) on Nov 01, 2004 at 15:34 UTC
    It isn't difficult to keep the versioning in the application level, and I believe that's the rigth thing to do.

    There's right, in terms of architecture, and right, in terms of performance and maintainability. While data versioning, according to some architectural viewpoints, belongs in the application, it's still, imho, a data issue. Consider how CVS and other source control applications work - the versioning is done in the engine (database, in our case), not the client (application, in our case).

    Furthermore, there is only one place that actually knows when data is changed - the database. Not all changes are guaranteed to occur through the application. In fact, most databases support more than one application, often written by different teams. By putting versioning in the common place, you gain a lot more certainty in how versioning will occur. "Write Once, Use Everywhere."

    Plus, there's the issue of how do you pull the versioned data out. If you could put it in your SQL statement, you reduce your network and application-side processing overhead, potentially by up to 90% in a heavily-used system. Plus, the RDBMS is written in highly-optimized C. Your client, in this case, is in (un)optimized Perl. I know where I'd prefer the decisions to be made.

    Additionally, this is a problem you want to have solved forever, no matter what you do. I hate having to solve the same problem over and over in an application when it should be solved in some service I'm using, whether it's in the database, Apache, mod_perl, or something else. And you should use a real, transactioned RDMS of course to avoid inconsistencies ;)

    MySQL actually has stronger default transaction support than Oracle, Sybase, and DB2. What's your point?

    Being right, does not endow the right to be rude; politeness costs nothing.
    Being unknowing, is not the same as being stupid.
    Expressing a contrary opinion, whether to the individual or the group, is more often a sign of deeper thought than of cantankerous belligerence.
    Do not mistake your goals as the only goals; your opinion as the only opinion; your confidence as correctness. Saying you know better is not the same as explaining you know better.

      My point is that as long as MySQL ignores the SQL standard and changes my data behind my back it's only good for blogs and nothing else.

      And I disagree about your point on overhead and reducing database load by putting the "versioned data" on the SQL statement. As long as I'm specifying ... AND VERSION=xxx ... it's on the SQL statement, no need to have a new SQL dialect.

      Furthermore that's a case where the performance bottleneck is the database (for large databases), not the application or versioning code, so writing it in C won't make it noticeably faster.

      Of course you're right about only having to "solve the problem once", but that's easy. Write the version control code as stored procedures. In PostgreSQL and Oracle you can even do that in Perl. But, oh, you're using MySql, sorry.

      ESC[78;89;13p ESC[110;121;13p

        My point is that as long as MySQL ignores the SQL standard and changes my data behind my back it's only good for blogs and nothing else.

        Please provide examples to support this statement. MySQL 4.x does not ignore the SQL standard - it does what every other vendor does and that is to augment the standard. Oracle is actually the biggest culprit in this. Postgres isn't far behind, frankly.

        As for changing data ... I have never heard of this. Unless, you're talking about the fact that MySQL silently converts data to the correct datatype. First off, there is nothing said about this in the standard, other than to say that it is implementation-dependent. (I've read all 1000 pages of the SQL-92 standard ... have you?) Second, Oracle and Sybase also both do something similar in certain cases. MySQL simply chose to do it in more cases than others. The places that it does this are all documented in an extremely easy-to-read manual (which Oracle doesn't have). Not to mention that Oracle's exceptions are NOT documented in an easy-to-search fashion. Having been bitten by them time and again, I much prefer MySQL's many documented exceptions over Oracle's few undocumented exceptions.

        Write the version control code as stored procedures. In PostgreSQL and Oracle you can even do that in Perl. But, oh, you're using MySql, sorry.

        You're absolutely correct - MySQL does not have triggers or stored procedures ... yet. The plan is to have enterprise-ready triggers by Q1/Q2 next year. Oh, and it will also have seamless clustering around the same time. You can actually use an alpha version of clustering today - I'm setting it up this afternoon. When is PG supposed to have that?



        *deep breath* We're starting to get into some hot conversation ... We can continue here or via email, if you prefer. I'd rather not start a flamewar on PM, if possible.

        Being right, does not endow the right to be rude; politeness costs nothing.
        Being unknowing, is not the same as being stupid.
        Expressing a contrary opinion, whether to the individual or the group, is more often a sign of deeper thought than of cantankerous belligerence.
        Do not mistake your goals as the only goals; your opinion as the only opinion; your confidence as correctness. Saying you know better is not the same as explaining you know better.

      MySQL actually has stronger default transaction support than Oracle, Sybase, and DB2.
      Could you expand on that a bit?

      Thanks!

      Michael

        Sybase1 operates by default on Isolation Level 1, which corresponds (roughly) to MySQL's 'READ COMMITTED'. MySQL2, by default, operates on 'REPEATABLE READ', which corresponds (roughly) to Sybase's Isolation Level 2. Oracle3 operates by default on something similar to Sybase's default isolation level, as does Postgres4.

        Now, all four RDBMSes can be set to operate at the level that the SQL92 standard recommends, which is Sybase's Isolation Level 3, or MySQL's 'SERIALIZABLE'. However, only Sybase and MySQL offer the ability to operate at any of the four isolation levels. Oracle8i and Postgres only offer READ COMMITTED and SERIALIZABLE (levels 1 and 3 in Sybase).

        1. the Sybase manual
        2. the MySQL manual
        3. the Oracle manual
        4. the Postgres manual

        Note: I couldn't find the official Oracle manuals through Google. I figured something by O'Reilly would be good enough.

        Being right, does not endow the right to be rude; politeness costs nothing.
        Being unknowing, is not the same as being stupid.
        Expressing a contrary opinion, whether to the individual or the group, is more often a sign of deeper thought than of cantankerous belligerence.
        Do not mistake your goals as the only goals; your opinion as the only opinion; your confidence as correctness. Saying you know better is not the same as explaining you know better.

Re^4: Conditional many to many relationships with Class::DBI
by poqui (Deacon) on Nov 01, 2004 at 22:15 UTC
    In a perfect world, I would agree with you.
    The application layer should be where all changes are handled and tracked, but too often that is not the case.
    At my company, the source system has been through several different applications, with data from each still being used by some users. Also, there are folks with direct access to the data (via sql) who perform updates directly without benefit of an application.
    That leaves me (and my teammates in the data warehouse) with 2 choices: rely on the change/update dates in the source system and just accept that there are changes we won't track, or use some kind of Changed Data Capture (CDC). We have survived for 3 years on the former, and are now trying to institute the latter.
    CDC can take several forms, but the best, in my opinion, is one that uses the database engine itself. The database will generally do logging of transactions, and scanning that transaction log for the tables you are interested in tracking is the best way to capture all changes.
    We have gone from full table scans of the last 3 days worth of changes, and still missing a good percentage of them, to capturing 100% of the changes we are interested in and loading 25% of the data we did before.

    Plus, we took the burden off the source system and put it on our newer servers.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://404314]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others cooling their heels in the Monastery: (1)
As of 2024-04-25 00:59 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found