Re^5: (Real) Database Design

The data layout is actually very similar to how MySQL lays out its MyISAM-type tables. (InnoDB is laid out differently.)

Separating records on disk isn't that difficult. Basically, you need to keep track in your metadata where in the datafile each record starts. That is usually done in one of two ways:

If your records are of a fixed length, you keep track of which record it is. So, if your records are 80 bytes long and you're looking for record N, you seek to position 80 * (N - 1).
If your reocrds are of variable length, you store the seek position for each record in your metadata file.

Now, you're trying to figure out how you know which records to look for. Well, it's quite simple. You generally are saying something like "Give me all the records that have such-and-such qualities." Well, in your metadata, you will hopefully have a cache of those qualities. So, if you're looking for all records that have ColumnA = 'Foo", you might have an index on ColumnA and you'll have a list of all the various values that it has and which records each value corresponds to. If you cannot use an index, you will have to do what's called a full-table scan. This means you have to look through the entire file checking ColumnA for its value. This is EXTREMELY expensive, often taking 100x (or more) as long. Then there's primary keys and foreign keys and all sorts of craziness. But, it all boils down to what information about the record you have cached and what you haven't.

As for caching to RAM ... databases will generally keep as much in RAM as possible. The most common strategy I've seen is to keep as much in RAM as possible, removing data as needed from the oldest data first. Oracle tends to store any rows that have been retrieved. MySQL will store the results of queries that have been executed.

I think I've kind of started an answer on the seeking question. Here, my theoretical knowledge is weaker. I've a lot of experience in optimizing SQL queries with indices and well-formed queries, but little in how the engine executes those commands. For that, consult prior art.

------
We are the carpenters and bricklayers of the Information Age.

Then there are Damian modules.... *sigh* ... that's not about being less-lazy -- that's about being on some really good drugs -- you know, there is no spoon. - flyingmoose

I shouldn't have to say this, but any code, unless otherwise stated, is untested

Comment on Re^5: (Real) Database Design


Your skill will accomplish what the force of many cannot
	PerlMonks