Beefy Boxes and Bandwidth Generously Provided by pair Networks
"be consistent"
 
PerlMonks  

Re: Basic Perl trumps DBI? Or my poor DB design?

by Anonymous Monk
on Oct 25, 2004 at 21:21 UTC ( [id://402325]=note: print w/replies, xml ) Need Help??


in reply to Basic Perl trumps DBI? Or my poor DB design?

To help you wrap your head around this result, here's a bit of perspective:

The DBMS is good at two things - finding something in a file with the minimum number of probes and whipping up results that are larger than memory. It generates programs very much like what you wrote to do it, but it has to guess at what methods to use and which files to read first. Actually I'm not sure how good MySQL is at it (though I hear good things) but Oracle and DB2 work that way -- they figure out an algorithm built out of functions they know how to do that logically solves your query and has a chance to do it in good time (but they usually don't think long enough to go for best).

You're forcing the DBMS to play its worst game - reading all the data to get the answer. Notice that your query necessarily touches every row of each of your designs except DB4. Still, the DB beat or tied perl in all the passes except DB4, and you probably wrote the DB test code quicker for those passes (assuming you're as good with SQL as perl). You could probably tweak DB performance up a bit with some of the suggestions the monks had, but I wouldn't look for it to beat your DB4 program.

Your program has the advantage of knowing (or accidently finding out :) that all of the answer fits in memory while you put it together - all of the responses for the first question minus all the responses for the second.... It's purpose built for the problem and knows facts about the data that the DB might not "consider" in building a plan. However, if you scale to 200M or so respondents you'll find your program breaking down, while the DB solution will probably grind out your answer using different methods from its toolkit, possibly in reasonably linear time (once you cross the elbow where tempfile swapping starts). We do routinely give up quickest for flexible, but many of us also give up simple for scalable, and quick answers for quick coding. I don't want to try to write your program for a different application where locking and versioning become important.

So yes, a special purpose engine is right for this problem if response time is important, and you're on the right track (UKBrowser nailed it). But if you find out you want to ask a different type of question you have to build another engine for the question, and people have this annoying habit of asking a different question right after they read a report. Which is why I'd build a DB anyhow, even if you don't use it for this particular application.

  • Comment on Re: Basic Perl trumps DBI? Or my poor DB design?

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://402325]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others browsing the Monastery: (6)
As of 2024-04-23 13:30 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found