comment on

When I was speaking of hardware, I was speaking of the performance limitations of the hard-drive as opposed to RAM and CPU. The actual architecture is unimportant because the operating system deals with that. The important thing is the ratio between retrieving data from a hard-drive and retrieving the same data from RAM.

You actually do want to have the metadata in a separate file than the actual data itself. In fact, you want it on a separate hard-drive, if possible. That way, your machine can read both files at the same time - taking advantage of multiple and/or dual-core CPUs. This also means that you can modify the metadata without having to re-write the data file. Metadata is usually a very small amount of data as opposed to the actual data itself. Remember - a database is usually only useful once you get beyond 3-8 tables and/or 10-20k rows. Metadata usually doesn't get beyond 10-20 kilobytes. Data is usually in the megabytes. You don't want to relocate 20M just to add another constraint to your 3K of metadata. This also allows you to (initially) keep your metadata in some human-readable form, like YAML or XML. (Or even Apache's configuration format, readable with Config::ApacheFormat.) That can make initial development a little easier. Or not.

To answer your sillier question - every single file is a binary file, from everything's perspective. The only thing that makes a file text vs. binary is how a program has been told to interpret it.

For example, FTP makes a distinction between the two only so it can translate \n in Unis to \r\n in Windows to \r in Mac. That's it. (Well, I'm sure VMS and OS/360 have their own linefeed characters, but who uses those? *grins*)

You will want to pack your data instead of naively storing it as characters. This reduces the space used, which reduces the time it takes to find a given record. Packing data according to various schemes is the topic of several graduates theses, and well beyond the limited discussion here. (I would suggest googling and having several cases of Jolt available. Oh - and don't buy Doom3 when it comes out on August 6th if you actually want to get this project done.) The important thing is that most of the character packing and unpacking work has already been done for you. You just have to know where to find it.

Your last question about data structure representation is misguided. You will not be working with the data as it is stored on disk. You will always be loading it from disk into RAM, then manipulating it there. Most metadata exists to reduce the amount of stuff that is retrieved from disk and converted into some useful data structure that lives in RAM.

Remember - a database isn't some magical program that is somehow different from every other program in how it interacts with the world. It's just a complex application that, at its heart, does the following:

Stores data on disk in a compact form
Retrieves selected data from disk as fast as possible

The reason this technology is so important is that retrieving selected data from disk as fast as possible is the heart of modern business. The key is selected. Every single application in the world can be viewed as a report on data stored in some datastore transformed in some way, with some giving the ability to modify that data and write it back to the datastore. The trick is only pullling back the data that the user wants to retrieve. Oh - and doing it as fast as possible, preferably beneath the human threshold (about a tenth of a second).

------
We are the carpenters and bricklayers of the Information Age.

Then there are Damian modules.... *sigh* ... that's not about being less-lazy -- that's about being on some really good drugs -- you know, there is no spoon. - flyingmoose

I shouldn't have to say this, but any code, unless otherwise stated, is untested

In reply to Re^3: (Real) Database Design by dragonchild
in thread (Real) Database Design by Anonymous Monk

Are you posting in the right place? Check out Where do I post X? to know for sure.
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
Want more info? How to link or How to display code and escape characters are good places to start.


Your skill will accomplish what the force of many cannot
	PerlMonks