http://qs321.pair.com?node_id=390494

I've just spent a gigantic amount of time wrestling with a problem, and since this is similar to other problems I've had in the past, and must be something that other people experience, I wanted to share it. Perhaps it's totally obvious, in which case I apologize for boring anyone reading. Maybe it should be in my use.perl journal. But if I were a professional programmer, I'd want to turn to the guy next to me and say, "You know what the problem was? It was ---". But I'm not, and I can't, so here I am.

I do a lot of stuff relating to web-database interactions.

When I started learning, I wrote absolutely everything by hand, and cut-and-pasted all over the place. It was tedious, and I made lots of mistakes, and wasted a lot of time. But I also knew that everything worked exactly the way I wanted it to.

One of the first things I picked up from somewhere was that when you checked your form parameters, one thing you should do is make sure there's something there--if it's blank, or just spaces, skip it. So, from the very start, all my programs had something in them like this:

foreach my $field (@fields) { my $value = param($field); $value =~ s/^\s+//; $value =~ s/\s+$//; next if $value eq ""; # construct SQL or otherwise work with $value }
Eventually I wrapped this in a function, then as I learned more, a method, and in both cases I forgot entirely about it.

Recently I've been working on increasingly larger and more complicated projects, and the roll-your-own approach has been getting difficult. So I've started to use various systems for doing things--templating systems, database abstraction layers, etc. Which particular ones don't matter, as my point is not to criticize the way any one works.

In many ways these have made things much better--certainly a lot of things have become faster to get going. But I often find it very difficult getting things to work exactly as I want, and end up overriding much of whatever system I'm using. But on the whole I'm finding it an advantage, frustrating though can be.

But I just ran into a problem that had me frustrated for days. In one database, I have some columns that are has_a lookups to another table; I add/edit these with drop-down menus. Some can be blank, so I added a blank row to my select generation--<select name="format"><option></option>. To my frustration, this did not give me an empty column; in my display, I'd have the field displayed--"format", or whatever--but with no value. This only happened for these drop-down menus, never with blank textfields or anything else.

The problem, it finally turned out, was that the abstraction layer I was using, which has all sorts of features to handle data validation, does not throw out blank values, it just sends them right through to the database. So if you leave a field blank, "" gets sent to the database. And most of the time, "" gets inserted into the database. Now, I know there's a difference between "" and undef in Perl, just like I know there's a difference between "" and NULL in my database. But staring at my raw database records, the lines with nothing (instead of NULL) next to them just didn't look wrong.

Now, the reason I didn't notice it until I added the select lists: My blank textfields, which also had "" inserted in them, were almost always VARCHARs or TEXTs in the database, so inserting "" would put "" into the database. My View class did DWIM, and wouldn't display a line if the value were "".

The select menus, on the other hand, were all INT values (they're all has_a relations, holding the PK of the lookup table). And when you insert the "" into an INT field, you don't get "", you get 0. And that, when run through the View class, is what showed the problem--the 0 "existed", and thus I'd get a blank line in the display (blank because the 0 didn't correctly index the lookup table).

I ended up writing a line to preprocess my parameters before handing it off to the database layer:

my %good_params = map { $_ => $params{$_} } grep { $params{$_} ne ""} keys %params;

Maybe if I were a better programmer, I would have thought of this earlier. Or maybe if I started in C. Perl usually DWIM, and when I have a blank textfield, I don't mean that I want "" inserted into my database, I mean that I want to leave the field alone.

I don't even know if there's a lesson here. Maybe my initial tip that you should check for "" before putting things into your DB was too thorough, or I internalized it so fully that the possibility of doing it any other way never came up. Maybe I should just have been more aware that Perl doesn't do this for you automatically. Maybe my abstraction layer should do this for you in its validation routines. Maybe I should write every line of everything myself, so I know that it's doing what I want. Maybe I'm just a moron, and should leave the programming to the programmers.

Update: While writing this, I realized a further problem that, incredibly, has never come up: I don't want to skip the field if it's an edit, rather than add, action. What if I have a field with a value in it that I want to remove? I can't blank it out, because then it will be blank, and my neat map above will just skip that field, and the original value will remain. On the other hand, if I instead map it to undef, I see that my abstraction layer will take it upon itself to skip the line! Rather than dealing this this now, I'm going to sleep.

Replies are listed 'Best First'.
Re: Writing It Myself, DWIM, and Expectations
by Zaxo (Archbishop) on Sep 13, 2004 at 05:06 UTC

    The DBI table_info method should be helpful in dredging up what sort of columns your db abstraction layer is producing. The NULLABLE field will tell you whether things written as undef will be returned as empty strings.

    That sort of problem is why I'm suspicious of dbi abstractions. I prefer to have a collection of carefully designed db schema which can be selected and plugged into an application. That kind of close coupling between SQL column attributes and perl data is a pain to maintain if you do it ad hoc. A carefully kept library of common column types and tables is not that much easier to maintain, but there is only one of them.

    Is your toolkit using CGI.pm? With CGI, what you write as keys %params is returned by the param method without arguments, $q->param.

    After Compline,
    Zaxo

      I would love to see you write a meditation on this. DB abstraction layers are of interest to me, especially when I come across something that can help me improve my hand-rolled ones. :-)

      Being right, does not endow the right to be rude; politeness costs nothing.
      Being unknowing, is not the same as being stupid.
      Expressing a contrary opinion, whether to the individual or the group, is more often a sign of deeper thought than of cantankerous belligerence.
      Do not mistake your goals as the only goals; your opinion as the only opinion; your confidence as correctness. Saying you know better is not the same as explaining you know better.

Re: Writing It Myself, DWIM, and Expectations
by johnnywang (Priest) on Sep 13, 2004 at 04:46 UTC
    There is a disussion of the similar issue in Modules, Frameworks, and Reinventing the Wheel. I've found myself in similar situations where some subtle points of a module (or a built-in function, for that matter) are only discovered after my long struggle. Many modules (and other open source projects) are so well-written that one can get one up and running in no time. The sublties/limitations will take some time to surface. There probably isn't any real solution. After all, the authors wrote those modules to solve their problems, which are rarely exactly our problems. On the positive side, that's why we still have a job.
Re: Writing It Myself, DWIM, and Expectations
by tilly (Archbishop) on Sep 13, 2004 at 16:38 UTC
    Now, I know there's a difference between "" and undef in Perl, just like I know there's a difference between "" and NULL in my database.

    You do?

    Not if your database follows the SQL-92 standard!

    Seriously, databases are not supposed to have the idea of an empty string. I learned this from mpeppler when he explained to me why DBD::Sybase was converting empty strings into spaces. Of course to confuse the issue, some databases (eg MySQL IIRC) do allow empty strings.

      The SQL-92 standard makes a distinction between empty string and NULL. It is Oracle that considers empty string and NULL to be equivalent. My impression is that Sybase and old versions of SQL Server behave the same way.
        Sybase (and presumably MS-SQL) will silently convert "" to " " as it can't store a zero-length string that is not NULL.

        Michael

      > Now, I know there's a difference between "" and undef in Perl, just like I know there's a difference between "" and NULL in my database.

      You do?

      Not if your database follows the SQL-92 standard!

      Leaving aside the issue of what the SQL-92 standard actually says, and also leaving aside the holy war about whether MySQL is a real database or not, the fact remains that my database, which is MySQL, does treat "" and NULL as different, so my above statement is correct.

      In the meantime, I've decided to use a different input validator, which is working much better for me.