Typeless Relational Database

tomazos has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Typeless Relational Database by dragonchild (Archbishop) on Jul 01, 2005 at 03:48 UTC
The early versions SQLite didn't make much differentiation between strings and numbers. Due to performance reasons, that kind of intelligence was built into SQLite pretty quick. I would posit that you're not quite sure what you're looking for from a database. Those annoying types are actually a very good thing, kinda like those annoying foreign keys. Each additional constraint you have to deal with when changing the data means one more assumption you can make when querying that data. The more assumptions you can make when querying, the faster the queries can go and the more confidence you have that the results are accurate. Remember - SELECTs generally make up over 99% of a database's activity. If you can improve the cost of a SELECT statement by 1% taking a 10% hit on each modify statement, that's a net gain of quite a lot. And, you have greater confidence in your data. That's a pretty good tradeoff to me. As for type conversions ... if you're doing type conversions, either your schema is wrong or you're not using it correctly. You should never ever have to use a type conversion, not even once. And, yes, that's a hard rule. My criteria for good software: Does it work? Can someone else come in, make a change, and be reasonably certain no bugs were introduced?	[reply]
Re^2: Typeless Relational Database by etcshadow (Priest) on Jul 01, 2005 at 05:40 UTC
As for type conversions ... if you're doing type conversions, either your schema is wrong or you're not using it correctly. You should never ever have to use a type conversion, not even once. And, yes, that's a hard rule. That's going a bit too far, don't you think? I'll try to think of a really good example... What about something like this, maybe: `select ... from table where table.day_in_month = to_number(to_char(sys +date,'dd'))` [download] Should I not be allowed to represent a day-in-month as a number? Should I, instead, be forced to use some sort of crazy "magic-number" reference month and year to store my day-in-month column, so that I can store it in a date datatype, and be able to do purely (but still crazy as all get-out) date-arithmetic to compare dates to the day-in-month? Imagine that the day-in-month is a piece of data used in representing a "recurrence" data-structure (like, "run this report on the 5th day of every month"), so I'm actually NOT talking about something that is a date... I'm talking about something more abstract, which can most easily be represented by type-casting (on top of some other logic). Here's another kind of crazy (but real! I swear!) example: Another, really powerful use of type-casting in SQL comes up in a very fast (but admittedly ugly) way to pull data in one field by correspondance to data in another field. By that I mean, say, give me the X that corresponds to the min(Y). The specific case when I've used this with is trying to pull the earliest created value of something (say FOO is a number, for this example, and you want the FOO corresponding to the min(CREATED)): `select to_number( substr( min(to_char(CREATED, 'yyyymmddhh24miss') \|\| FOO)), 15, 400 ) ) from table where ...` [download] If that's a little hard to understand, that's not too surprising, as it's something analogous to the GRT, but for seeking a min or a max, rather than for sorting (as the GRT is just a special case of the ST for sorting). That is: pack the carrier data (CREATED) together with the payload data (FOO), in such a way as comparisons of the packed carrier+payload correspond to any comparisons made against the naked carrier (i.e. if rowX.CREATED < rowY.CREATED, then to_char(...rowX...) < to_char(...rowY...), accordingly). Then, use comparisons made against the packed carrier+payload combo, which will yield back the data in which you're interested, but as packed carrier+payload. And then, finally unpack the payload from the carrier (i.e. the to_number(substr(...)) construct). Anyway, I know it's esoteric, but it is real, and it's WAY more efficient than the more purist SQL: `select FOO from table where CREATED = (select min(FOO) from table where ...) and ...` [download] particularly if there's some interesting stuff in that "..." part of the query (which has to be repeated in the sub-query). (Oh, and yes, I did purposefully gloss over a few details in that example, that weren't vital to getting the point accross.) `------------ :Wq Not an editor command: Wq` [download]	[reply] [d/l] [select]
Re^3: Typeless Relational Database by dragonchild (Archbishop) on Jul 01, 2005 at 13:30 UTC
select ... from table where table.day_in_month = to_number(to_char(sysdate,'dd')) `SELECT ... FROM table WHERE DAYOFMONTH( NOW() ) = table.day_in_month` [download] Your RDBMS should provide you with sufficient functions so as to make this unnecessary. And, though I don't have the reference in front of me, I'm pretty sure that Oracle does, just like MySQL and PostgreSQL both do. Your second example is an optimization. Those are deliberate breakings of good practices (that reduce developer time) in order to gain in some other area (such as processor time). It's the classic tradeoff. My criteria for good software: Does it work? Can someone else come in, make a change, and be reasonably certain no bugs were introduced?	[reply] [d/l]
Re^4: Typeless Relational Database by etcshadow (Priest) on Jul 01, 2005 at 14:48 UTC
Re^5: Typeless Relational Database by dragonchild (Archbishop) on Jul 01, 2005 at 16:12 UTC
Re^2: Typeless Relational Database by tomazos (Deacon) on Jul 01, 2005 at 05:02 UTC
I agree that typing is a good thing for some applications, however I think the arguments against requiring typing in Perl could be made for a database. Yes, having constraints on the data that goes into the database is good for efficiency, reliability and performance - hometimes I'd like to optimize for development time. For example, a small unimportant system that isnt going to hold much data, or a prototype that needs to be rapidly developed with a small investment in developer time. A typeless relational database has (would have) applications. And as for never having to use type conversions, what about shortening a string, or converting a numeric string to an int. Or making sure that the number is low enough to fit in a byte. etc etc. Andrew Tomazos \| andrew@tomazos.com \| www.tomazos.com	[reply]
Re: Typeless Relational Database by Cody Pendant (Prior) on Jul 01, 2005 at 03:59 UTC
So why can't I have the same thing from my database? Every database I've seen has been an SQL relational database. You have to take a long time specifying column types and what not. One almost-too-obvious answer would be "because the database has to read from and write to a disk sooner or later". Your Perl variables are being managed in RAM, and Perl will do whatever it can for you until it runs out of RAM altogether. The DB on the other hand, is reading and writing on a HD somewhere, and its function is to manage that reading and writing for you, so you don't have to worry about it. The more a DB "knows" about the kind and size of the information it's dealing with, the more efficient it can be. ($_='kkvvttuu bbooppuuiiffss qqffssmm iibbddllffss') =~y~b-v~a-z~s; print	[reply]
Re^2: Typeless Relational Database by tomazos (Deacon) on Jul 01, 2005 at 05:05 UTC
I'm not sure I see the difference between reading and writing to RAM and reading and writing to the HD in this regard. At the end of the day the information is stored as a sequence of bytes. Whatever technique Perl uses to store a $scalar in RAM could be used in the same fashion to store it on the HD. Could it not? Andrew Tomazos \| andrew@tomazos.com \| www.tomazos.com	[reply]
Re^3: Typeless Relational Database by CountZero (Bishop) on Jul 01, 2005 at 08:37 UTC
In theory, yes; in practice, no. Reading/writing from/to RAM is many times faster and reorganizing the storage in RAM is much easier. In order to get any kind of acceptable speed out of your HD, lots of tables need to be updated and saved to allow fast access to the data. In RAM-storage, in a pinch you can simply walk through your RAM to find the data you need, if you do that on a HD, you can go out and get yourself a coffee (and a donut or a toasted bagel) and return before your data is found. CountZero "If you have four groups working on a compiler, you'll get a 4-pass compiler." - Conway's Law	[reply]
Re: Typeless Relational Database by CountZero (Bishop) on Jul 01, 2005 at 05:36 UTC
If you access your (SQL/relational/typed) database with DBI/DBD, you really get (sort of) a typeless database. Read a record with a "string" field containing only figures, put it in a scalar, multiply it by something, store it again in the scalar and save it in the database: the data start as a string, gets silently converted to an integer or float (as need be) and gets saved as a string again. Perl has taken care of all type-conversions. So what are we complaining about? The small effort of setting-up the database? I actually find that a Good Thing^tm as it makes you think about your data, but YMMV. And if you want to skip even that little effort, declare all your fields to be '`VARCHAR`' (max. length 255) or '`TEXT`' (max. length 65,535) in MySQL. One single type to rule them all and in the database bind them! Looks pretty (and) typeless to me. CountZero "If you have four groups working on a compiler, you'll get a 4-pass compiler." - Conway's Law	[reply] [d/l] [select]
Re^2: Typeless Relational Database by jhourcle (Prior) on Jul 01, 2005 at 11:52 UTC
Yes, it'll work... and if you use it for any extended period of time, you'll most likely never use it again. (and prove that strict typing can be a good thing) I've had to deal with tables where someone had dne this -- every field was a varchar(255), completely denormalized. (it was basically just logging information from sitescope). I ran into the following problems: Any sort of selects were slow.-- Every field was larger than it had to be (strings for dates, strings for numbers, etc.) as it could read fewer records per disk access Because the fields weren't fixed length, anything with a 'where' clause couldn't just know that field 6 started at byte 24 ... it'd have to look through fields 1-5. Inequality matches were abysmal. SQL doesn't have seperate 'lt' and '<' operators, so you had to do the conversions by hand...and you'd have to parse all of the date strings every time. No sanity checking on values -- because you could write anything into any field, you can't catch bad data on insert -- you find the problems when you go to read. There were probably other problems, that I either never found, or have managed to block out from that horrible time. Yes, this allows you to start collecting data with little thought, but it makes it a pain to deal with the data in the future -- it's like saving any type of file to a folder, without any markings if it's an image, text file, or some other format -- writing is just fine, but it's much more work when you have to go to actually read them. It was so horrible to deal with, and as I couldn't change the application that was writing, I found it easiest to move the data from the bad table to something more compact, with typing. So, to anyone who thinks strict typing is a problem -- try this, as an experiment. If it works for you, fine. If it doesn't, well, then get rid of it as quickly as possible.	[reply]
Re: Typeless Relational Database by anonymized user 468275 (Curate) on Jul 01, 2005 at 13:32 UTC
In an RDBMS, join optimisation relies on having fixed length fields (so it can build a cross-product in memory). A perl scalar for the reasons you point out can't be fixed length and so it would be harder (but not impossible) to write a perl-like RDBMS which delivered reasonable performance. One world, one people	[reply]
Re: Typeless Relational Database by herveus (Prior) on Jul 01, 2005 at 13:33 UTC
Howdy! SQLite has not discarded the flexibility, although it does now store integers as integers, it still permits you to store arbitrary data in a column. Most (every other?) RDBMS uses container typing (the column is declared with a type and you have to conform), but SQLite uses value typing (where the type is associated with the value, much the way Perl scalars work). SQLite is that "typeless" RDBMS you are looking for. yours, Michael	[reply]
Re: Typeless Relational Database by samtregar (Abbot) on Jul 01, 2005 at 21:09 UTC
MySQL is pretty close, particularly 3.x. Just make all your columns VARCHAR(255) and when you treat them as numbers MySQL will autoconvert. If you want to allow for bigger fields just make them all TEXT, which should work (although I've never tried treating a TEXT field as a number). I agree that a typeless relational database would be a very useful thing to have. I might not use it for the most critical projects but it would make one-offs a lot easier to develop. -sam	[reply]


more useful options
	PerlMonks