Beefy Boxes and Bandwidth Generously Provided by pair Networks
P is for Practical
 
PerlMonks  

Re^4: (Real) Database Design

by Anonymous Monk
on Jul 14, 2004 at 20:51 UTC ( [id://374439]=note: print w/replies, xml ) Need Help??


in reply to Re^3: (Real) Database Design
in thread (Real) Database Design

This has been a most insightful conversation!

I think you've already answered most of my questions (and more importantly, potential questions). I would appreciate any advice you have on the following structure of events, but I think I'm almost ready to begin.

So the basic structure of the database (from one perspective) would go something like this:

  • Database is laid out in a hierarchal directory design. Tables are files on the disk with meta data stored elsewhere (preferably on a separate disk) possibly mirroring the hierarchal structure of the data dir tree.
  • Files are (naturally) binary in nature, the exact representation of the data types shall be determined after much googleage and jolt consumption. (I'm still a little fuzzy on how to separate records in the data file - can I just a fix row length in the meta files and go by that?)
  • As for reading data from the disk as fast as possible - any last tips on caching to ram? As I see it, try to determine the most used data and ensure it's always in ram. For the small dbs I'm planning on building at first it won't be a problem to load the entire database into ram.
  • Seeking specific data from a file - determine applicable fields, iterate over them checking their contents, return the matching results. Any last performance tips here?

This might just be random ramblings at this point, I think you've pointed me in the right direction now. Thanks again for all the help! :)

Replies are listed 'Best First'.
Re^5: (Real) Database Design
by dragonchild (Archbishop) on Jul 15, 2004 at 02:04 UTC
    The data layout is actually very similar to how MySQL lays out its MyISAM-type tables. (InnoDB is laid out differently.)

    Separating records on disk isn't that difficult. Basically, you need to keep track in your metadata where in the datafile each record starts. That is usually done in one of two ways:

    1. If your records are of a fixed length, you keep track of which record it is. So, if your records are 80 bytes long and you're looking for record N, you seek to position 80 * (N - 1).
    2. If your reocrds are of variable length, you store the seek position for each record in your metadata file.

    Now, you're trying to figure out how you know which records to look for. Well, it's quite simple. You generally are saying something like "Give me all the records that have such-and-such qualities." Well, in your metadata, you will hopefully have a cache of those qualities. So, if you're looking for all records that have ColumnA = 'Foo", you might have an index on ColumnA and you'll have a list of all the various values that it has and which records each value corresponds to. If you cannot use an index, you will have to do what's called a full-table scan. This means you have to look through the entire file checking ColumnA for its value. This is EXTREMELY expensive, often taking 100x (or more) as long. Then there's primary keys and foreign keys and all sorts of craziness. But, it all boils down to what information about the record you have cached and what you haven't.

    As for caching to RAM ... databases will generally keep as much in RAM as possible. The most common strategy I've seen is to keep as much in RAM as possible, removing data as needed from the oldest data first. Oracle tends to store any rows that have been retrieved. MySQL will store the results of queries that have been executed.

    I think I've kind of started an answer on the seeking question. Here, my theoretical knowledge is weaker. I've a lot of experience in optimizing SQL queries with indices and well-formed queries, but little in how the engine executes those commands. For that, consult prior art.

    ------
    We are the carpenters and bricklayers of the Information Age.

    Then there are Damian modules.... *sigh* ... that's not about being less-lazy -- that's about being on some really good drugs -- you know, there is no spoon. - flyingmoose

    I shouldn't have to say this, but any code, unless otherwise stated, is untested

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://374439]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others drinking their drinks and smoking their pipes about the Monastery: (2)
As of 2024-04-26 02:48 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found