Beefy Boxes and Bandwidth Generously Provided by pair Networks
No such thing as a small change

Re^2: Typeless Relational Database

by etcshadow (Priest)
on Jul 01, 2005 at 05:40 UTC ( #471555=note: print w/replies, xml ) Need Help??

in reply to Re: Typeless Relational Database
in thread Typeless Relational Database

As for type conversions ... if you're doing type conversions, either your schema is wrong or you're not using it correctly. You should never ever have to use a type conversion, not even once. And, yes, that's a hard rule.

That's going a bit too far, don't you think? I'll try to think of a really good example... What about something like this, maybe:

select ... from table where table.day_in_month = to_number(to_char(sys +date,'dd'))
Should I not be allowed to represent a day-in-month as a number? Should I, instead, be forced to use some sort of crazy "magic-number" reference month and year to store my day-in-month column, so that I can store it in a date datatype, and be able to do purely (but still crazy as all get-out) date-arithmetic to compare dates to the day-in-month? Imagine that the day-in-month is a piece of data used in representing a "recurrence" data-structure (like, "run this report on the 5th day of every month"), so I'm actually NOT talking about something that is a date... I'm talking about something more abstract, which can most easily be represented by type-casting (on top of some other logic).

Here's another kind of crazy (but real! I swear!) example: Another, really powerful use of type-casting in SQL comes up in a very fast (but admittedly ugly) way to pull data in one field by correspondance to data in another field. By that I mean, say, give me the X that corresponds to the min(Y). The specific case when I've used this with is trying to pull the earliest created value of something (say FOO is a number, for this example, and you want the FOO corresponding to the min(CREATED)):

select to_number( substr( min(to_char(CREATED, 'yyyymmddhh24miss') || FOO)), 15, 400 ) ) from table where ...
If that's a little hard to understand, that's not too surprising, as it's something analogous to the GRT, but for seeking a min or a max, rather than for sorting (as the GRT is just a special case of the ST for sorting). That is: pack the carrier data (CREATED) together with the payload data (FOO), in such a way as comparisons of the packed carrier+payload correspond to any comparisons made against the naked carrier (i.e. if rowX.CREATED < rowY.CREATED, then to_char(...rowX...) < to_char(...rowY...), accordingly). Then, use comparisons made against the packed carrier+payload combo, which will yield back the data in which you're interested, but as packed carrier+payload. And then, finally unpack the payload from the carrier (i.e. the to_number(substr(...)) construct).

Anyway, I know it's esoteric, but it is real, and it's WAY more efficient than the more purist SQL:

select FOO from table where CREATED = (select min(FOO) from table where ...) and ...
particularly if there's some interesting stuff in that "..." part of the query (which has to be repeated in the sub-query). (Oh, and yes, I did purposefully gloss over a few details in that example, that weren't vital to getting the point accross.)
------------ :Wq Not an editor command: Wq

Replies are listed 'Best First'.
Re^3: Typeless Relational Database
by dragonchild (Archbishop) on Jul 01, 2005 at 13:30 UTC
    select ... from table where table.day_in_month = to_number(to_char(sysdate,'dd'))
    SELECT ... FROM table WHERE DAYOFMONTH( NOW() ) = table.day_in_month

    Your RDBMS should provide you with sufficient functions so as to make this unnecessary. And, though I don't have the reference in front of me, I'm pretty sure that Oracle does, just like MySQL and PostgreSQL both do.

    Your second example is an optimization. Those are deliberate breakings of good practices (that reduce developer time) in order to gain in some other area (such as processor time). It's the classic tradeoff.

    My criteria for good software:
    1. Does it work?
    2. Can someone else come in, make a change, and be reasonably certain no bugs were introduced?
      So it's your contention that "DAYOFMONTH(...)" is not a type-conversion? I see one type going in (date) and a different type coming out (number). Or do you mean to say that only type-conversions that are built-in functions (but excluding the to_X(...) ones) are ok to use? Seems like an arbitrary line. Also, I doubt whether the makers of RDBMS have figured out every possible built-in type-converting function that might make sense to exist (good RDMSs allow developers to define their own functions for a reason).

      As for the second, yes, I know full well that it's an optimization. An ugly one, in fact. The sort of thing that I don't even write out by hand, but have perl code generate for me. However, I don't think that makes it "wrong". Premature optimization is wrong, sure, but necessary optimization is... well... necessary. And it is hardly "wrong".

      I think I'm sounding upset, and that's not how I mean it. I'm not upset at all, nor am I trying to turn this into a heated argument. I was just being pedantic about the absolutism of your statement that type-conversion is ALWAYS wrong. I heartily agree that type-conversion is *usually* wrong. But it's a lot less always wrong than goto, for example, and even goto has, like, one or two legitimate uses.

      ------------ :Wq Not an editor command: Wq
        I don't think of DAYOFMONTH() as a type-conversion. I think of it as a one-way data transformation. I'm extracting data, not converting the type. This may sound like semantics, but it's not. For example, do you consider accessors as type converters? You can think of DAYOFMONTH as an accessor on a DATE column. Now, if you were to write a DAYOFMONTH function in Oracle, that wouldn't be a type converter. It may utilize type-conversion as part of its process, but that's irrelevant to the user.

        Optimizations aren't wrong, and they don't violate the hard rule I mentioned. Well, they do, but they do so because they're increasing developer cost to reduce some other cost that's been deemed more important. The hard rule I mentioned assumes you want to minimize developer cost. If you don't, then violate the rule. :-)

        Goto has plenty of uses. Unfettered goto is less useful. Remember - next and last are both goto statements with severe restrictions. The only hard and fast rule I can think of is that every rule has an exception, usually because you sidestepped one of its assumptions.

        My criteria for good software:
        1. Does it work?
        2. Can someone else come in, make a change, and be reasonably certain no bugs were introduced?

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://471555]
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others drinking their drinks and smoking their pipes about the Monastery: (7)
As of 2020-09-28 20:40 GMT
Find Nodes?
    Voting Booth?
    If at first I donít succeed, I Ö

    Results (144 votes). Check out past polls.