Beefy Boxes and Bandwidth Generously Provided by pair Networks
Come for the quick hacks, stay for the epiphanies.
 
PerlMonks  

Re: Basic Perl trumps DBI? Or my poor DB design?

by punch_card_don (Curate)
on Oct 25, 2004 at 22:28 UTC ( [id://402370]=note: print w/replies, xml ) Need Help??


in reply to Basic Perl trumps DBI? Or my poor DB design?

Holy cow - thanks all for the input. This is really being extraordinarily interesting.

OK, i've pretty much digested the common thread to replies - that so long as the data fits in memory, there should be no surprise that a custom coded Perl solution beats an overhead-laden rdbms soltuion. The question is a strategic one - trade flexibility, robustness and standardization for speed, or not?

demerphq's ideas appear to me to closely resemble my DB3 - splitting the data up into enough mysql tables to have every column indexed, except I went to the extreme and had one column per table. We got the result he expected - DB3 was the fastest of all mysql-based queries. The mere fact he suggested doing something like this makes me think I should keep this solution in the back of my mind - maybe sometimes odd design can justify itself.

BrowserUK - you are a coding machine! But I'm not clear on how we went from talking about 250milliseconds-per-query in paragraph 2, then 250 queries-per-millisecond in paragraph 4. I do very much like your string-row solution. Again, it's giving me ideas about reading up on piddles. Do I understand correctly that the trial times are for a single execution? If so, it looks like your strategy applied to 3,000 individual text tables is still holding the speed record. Thanks.

I think then what I have in DB4 is a baseline. This is a fast as you could possibly hope for, so shoot for the closest to to that within a mysql solution. At least we know how 'good' a solution is with a baseline for comparison. I wonder if a hand-coded Perl query system might make a good standard practice in database development when the data lends itself to that, so that developers know how fast is fast in a particular situation?

The various treatments of DB1 & 2, including dragonchild's, make me really want to see that schema approach the baseline. I think it might be the best compromise.

SO, I'm embarking on a matrix of tests on that schema. The variables are:

  • order of declaration of columns
  • order of declaratino of compound primary key
  • order of declaration of KEYs
  • order of population of series (which for-loop goes inside which)
I count a total of 24 permutations, including ones with no KEYs. This wil take a day or two to find time - - - then I'll start a new thread.

Then we'll make a strategic choice.

Thanks all for your participation.

  • Comment on Re: Basic Perl trumps DBI? Or my poor DB design?

Replies are listed 'Best First'.
Re^2: Basic Perl trumps DBI? Or my poor DB design?
by BrowserUk (Patriarch) on Oct 25, 2004 at 23:33 UTC
    But I'm not clear on how we went from talking about 250milliseconds-per-query in paragraph 2, then 250 queries-per- millisecond in paragraph 4

    Your right. The post was written in two stages. Originally it was based on a few lines of code that I threw togther to test the idea out. No subroutines (or their overhead). Only positive match detection. Much smaller datasets. It worked and I starting writing the post on that basis. Then I realised that it was way too limited in the types of questions it could answer and the hard coded scale of the test was limiting, so I went back and improved things.

    The numbers in paragraph 4 are leftovers from the original, artificially simpler, but faster tests. I will update the node.

    As an aside, the same technique can be applied even to datasets where the answers are not yes/no, provided the range of answers can be reduced to a reasonable range of discrete values. Ie. multiple choice as you are doing.

    All too often you see applications storing things like dates, ages & other forms of continuously variable numeric values, when all that is really required for the application is "Under 18 | over 18 and under 65 | over 65" and similar, that can be easily replaced by an enumeration. Many DBs can handle these quite efficiently.

    Unfortunately, they also tend to apply arbitrary limits to the size of an enumeration, 32 or 64 etc. The shame is that in many cases, the limits for the number of indexes that may be applied to a given table (MySQL:32, DB2:(was)255), coincide. In many cases, the use of large enumeration types could substitute for large numbers of indexes with conciderable better efficiency. They can also be used to remove the need for foriegn keys in some cases, for another hike in efficiency.


    Examine what is said, not who speaks.
    "Efficiency is intelligent laziness." -David Dunham
    "Think for yourself!" - Abigail
    "Memory, processor, disk in that order on the hardware side. Algorithm, algorithm, algorithm on the code side." - tachyon
Re^2: Basic Perl trumps DBI? Or my poor DB design?
by perrin (Chancellor) on Oct 26, 2004 at 17:34 UTC
    SO, I'm embarking on a matrix of tests on that schema

    All you really need to do is learn how to use EXPLAIN. It will tell you how the database processes the query so you can adjust things to make it use correct indexes.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://402370]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others musing on the Monastery: (4)
As of 2024-04-19 16:49 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found