http://qs321.pair.com?node_id=139869

Hi all,

    I have been working on a module at work, and something came up that i thought would create, if nothing else, an interesting debate. First, i will explain what i am working with, then, what i have done.

The Job

    We have a process by which a table is filled with job-specific data for a great number of processes. These include scheduling information, runtimes, success and failures, etc (Data held over several tables, etc) . I am working on a module to allow generalised access to this data, for the obvious reasons.
    My first module, is to allow access to the main job table, which contains the job_name (key), and the general job information, including schedule. Here come the part that people seem to object to ...

The Sin

    I am using SELECT * in my SQL. Now, before my fellow monks fall dead to the floor, let me explain. The new() constructor takes a job name. The new() constructor, then uses DBI and DBD::Sybase to query the database using fetchrow_hashref(). Then, i set a key in my hash-ref-object to the hashref returned. To date i have heard nothing in code reviews but "Never use select *", but the reason for that is that your code will break if a field is ever added.I think that in my case, using select *,fetchrow_hashref() and AUTOLOAD makes the module more flexible when columns are add/removed/reordered.

A Moral ?

    I am interested to see if anyone thinks that i should not be using select *, and if they have any reason other than superstition. I must say, i have always said to people not to use it, but i think i may have found an exception... Let me know what you think.
from the frivolous to the serious

Replies are listed 'Best First'.
Re: Module Pondering
by Masem (Monsignor) on Jan 18, 2002 at 21:23 UTC
    While using fetchrow_hashref is the counterbalance to using "select * ...", the problem is that _hashref is *slow* compared to fetchrow_array or the likes, and if this is a highly-accessed script, that might cost significant CPU time. It's typically better to specify what columns you want in the select, and then, if you really need that hash, use a hash slice to merge the column names and data (eg @row{ @columns } = fetchrow_array). If you don't necessarily know column names to start, it's easy to get them via the DBI interface, and this only has to be done once at the start of the program.

    -----------------------------------------------------
    Dr. Michael K. Neylon - mneylon-pm@masemware.com || "You've left the lens cap of your mind on again, Pinky" - The Brain
    "I can see my house from here!"
    It's not what you know, but knowing how to find it if you don't know that's important

(Ovid - more "Death to Select Star") Re: Module Pondering
by Ovid (Cardinal) on Jan 18, 2002 at 21:48 UTC

    Masem gave a good answer and I want to extend it by pointing out some other reasons for not using SELECT * FROM foo.

    I was puzzling through some old code trying to figure out what was going on. The programmer had used fetchrow_hashref (with the SELECT *) to grab a bunch of data that controls client customization. Later on in the code, I saw that he was doing some more selects to grab display information. I used Data::Dumper to dump the results of his first SELECT and lo, he already had the information! It turns out that we had a poorly normalized database and the data was stored in two different places. The programmer didn't realize this, but the original SELECT * hid from him exactly what fields he was grabbing. The moral? Specifying field names is self-documenting if the fields are named correctly.

    Tell me what type of data the following returns:

    SELECT * FROM customer WHERE customer_id = 7

    You could make some good guesses, but unless you go to the database or print out the return data, you're not going to know. Making self-documenting code is incredibly important and should never be overlooked. Not forcing yourself to jump through hoops to look up data is a good example of the Perl virtue "Laziness".

    Another reason for naming the fields you want is the idea that your code should do as little as possible. If you have 30 fields in a table, but only need two, why grab all 30? Yet another good example of "Laziness".

    Lastly, ignoring the performance issue, we all know one of the worst problems with hashes:

    $total_orders += $salesman{ orders_recieved };

    In the above example, 'received' is spelled wrong. If you're not careful in your code, that undef field can evaluate as zero and your total orders are incorrect. That's always been a problem with hashes and you'll find many modules that attempt to rectify this (and this is part of the reason the ill-fated "pseudo-hash" was created).

    Cheers,
    Ovid

    Join the Perlmonks Setiathome Group or just click on the the link and check out our stats.

Re (tilly) 1: Module Pondering
by tilly (Archbishop) on Jan 19, 2002 at 10:46 UTC
    I have a different and (to my mind) far more important reason for not selecting *.

    The problem is that you want to make errors obvious as early and as explicitly as possible. If the programmer has been explicit about what columns are expected there, then in the future if you rebuild the database, migrate the script, etc you will find out early about any bad assumptions about what columns are present. By contrast if you select *, things may look like they work for a good bit, but expected data is not present (because the columns were changed out from under you). This will likely take longer to debug simply because if (not when) you get an error message, it will not be obviously tied to where the actual broken assumption is.

    So be explicit up front and some day it will save you a headache.

Re: Module Pondering
by tadman (Prior) on Jan 18, 2002 at 22:56 UTC
    I couldn't agree with Ovid more. Using 'SELECT *' is not just about column alignment mismatch issues. When you 'SELECT *' on a 500 column table, all the columns are loaded, even if you only use 1 or 2. This can be a lot of data exchange if your DB resides on a remote server.

    Generally, you have an idea as to what you are asking for, or at least, what the application of your select will be. Even if you are using the hashref methods, less data loaded means faster transactions.

    The "Moral"? Just because you can make 'SELECT *' safe to use does not make it the right thing to do. As they say: Just because you can doesn't mean you should.
Re: Module Pondering
by z3d (Scribe) on Jan 21, 2002 at 00:23 UTC
    As if the legitimate issues regarding selecting too much data, etc., weren't enough....a comment from a different perspective. I seem to recall a few years ago that there were a few databases (vendor-wise) that didn't respond correctly to the select *. Sometimes they returned all of the data, sometimes not. If I remember right, this was a flaw in Microsoft SQL 7 server (yeah, yeah, yeah, we all have our burderns to bear). In addition, though, I have on occasion seen select * not return all of the columns. Sporadic, non repeatable, I know, but the lack of perfect success leads me to suggesting that you should avoid select * whenever possible.

    Not to mention the load once your database starts getting to a decent size...

      Very interesting ...

      I avoid SQLServer (all versions), but someone emailed about a possible bug with my module, DBIx::XHTML_Table. They said that when using select * in my module with SQLServer7, some of the field names did not show up.

      Thanks for pointed this out. Takes a load off of me. ;)

      jeffa

      L-LL-L--L-LL-L--L-LL-L--
      -R--R-RR-R--R-RR-R--R-RR
      B--B--B--B--B--B--B--B--
      H---H---H---H---H---H---
      (the triplet paradiddle with high-hat)