http://qs321.pair.com?node_id=265961

Seumas has asked for the wisdom of the Perl Monks concerning the following question:

I have some functions that perform SQL queries (DBI.pm), updates and inserts using data that is passed to the function.

I want to be able to handle both quoted and unquoted strings in this function, so I need to be able to detect whether the strings have been quoted already (and if not, quote() them).

Is there a way to do that? It seems that if I quote() an already quote()'d value, it requotes the existing escape quotes and really screws the string up.

Replies are listed 'Best First'.
Re: Test if string is already quote()'d?
by grantm (Parson) on Jun 14, 2003 at 22:46 UTC

    DBI's quote method is one of those things that you should almost never use. Instead of saying something like this:

    my $name = $dbh->quote("widget"); $dbh->do("delete from products where name = $name");

    It's much safer to use placeholders like this:

    $dbh->do("delete from products where name = ?", $name);

    In this case, your function would accept only unquoted strings, but it's a better solution in the long run.

    Update Sorry, that should have been:

    $dbh->do("delete from products where name = ?", undef, $name);

    The do method accepts an optional hashref of attributes as its second parameter.

      I use placeholders when possible, but I'm encountering difficulty in situations where I am building an SQL statement depending on various conditions.

      The only way I have found to handle such a situation is to build the SQL statement into a scalar such as $sql and then do a prepare($sql) or do($sql). But then I have to figure out how to build the execute() so I can stick the bind values in. Since the statement I'm building will have a different number and placement of placeholders depending on the situation, the number and order of values to stick in execute() will vary, too.

      I have tried a number of joins and mappings to make this all come out right, but it's nearly impossible to stuff all the values into the right order when the statement can vary drastically from one run to another.

      Also, as I understood from the DBI docs, do() should prepare (including escape quoting) a statement -- but this only seems to be true if you're using placeholders. In other words, if I build a statement and stuff it into $sql (where $sql contains the full statement without any placeholders) and run do($sql), my strings are still naked.
        "I have tried a number of joins and mappings to make this all come out right, but it's nearly impossible to stuff all the values into the right order when the statement can vary drastically from one run to another."

        Nonesense. ;)

        Say you have a hash ref like so:
        my $hash = { name => 'Joe', phone => 222, code => 'A', };
        Just a simple hash ref whose keys are fields in a database table. Now consider this sub that takes two args, a scalar (the name of the table to insert into) and a hash ref:
        sub insert { my ($table,$hash) = @_; my $sth = $dbh->prepare( "insert into $table (" . join(',', keys %$hash) . ') values (' . join(',',map '?',keys %$hash) . ')' ); $sth->execute(values %$hash); }
        What's so impossible about that? Of course, you don't have to use a hash ref, you could use a hash - or two arrays, or two array refs. The important thing is that key 3 always points at value 3 - order doesn't matter, and that's why i prefer a hash for this.

        If your queries are still too complex for this, then i recommend you look into SQL abstraction modules such as DBI::Wrap or DBIx::Abstract.

        Also, i only use do() for quick-n-dirty stuff. From the docs:

        The do() method can be used for non repeated non-SELECT statement
        (or with drivers that don't support placeholders) ...
        
        Best to stick with prepare, but note that if every query is going to be significantly different (different columns, different tables), then you won't gain much from prepare's caching.

        jeffa

        L-LL-L--L-LL-L--L-LL-L--
        -R--R-RR-R--R-RR-R--R-RR
        B--B--B--B--B--B--B--B--
        H---H---H---H---H---H---
        (the triplet paradiddle with high-hat)
        
Re: Test if string is already quote()'d?
by adrianh (Chancellor) on Jun 14, 2003 at 23:19 UTC

    In the general case it's impossible. For example if your string is 'hello' is this a string containing quote marks that need escaping, or an already escaped string just containing the word "hello"?

    As grantm pointed out the right solution is to use placeholders.

    If you need to generate the SQL then you just need to keep an array of values around to pass to the execute. Every time you get to the stage that you want to embed a value in the SQL add a '?' instead and push the actual value onto your array. When you have your finished SQL you have the list of values ready to pass to execute.

      Yeah, that's what I was assuming with regard to quoted strings. The problem wouldn't be so bad with outside quotes, but once you get into having to check whether the value was actually escape quoted ('O\'Donnel' for example) and so on... Ugh.

      I suppose I can just go ahead and push the bound parameter values into an array to execute. That seemed like a clunky way to do it in my head, because it just seemed too possible to miscount or trip up... but maybe that's not as likely as I first thought.

      Not to mention... quote()'ing everything is insane and I've been avoiding it as much as I can except in this specific circumstance.

      Sometimes you have the answer all along and just need people to reaffirm that you're not stupid. :) Thanks.

      The good thing is that this isnt a general case. He isn't trying to extract a quoted string from a stream afaict, but rather from a variable that he probably has considerable control over. In that case I would imagine that something like

      my $arg=shift; $arg=~s/\A(["'])(.*)\1\z/$2/s; $arg=quote($arg);

      would be perfectly acceptable, and if he had total control over the input then a simple $arg=quote($arg) unless ($arg=~/\A(["']).*\1\z/s) would suffice, and might even offer interesting power play opportunities.


      ---
      demerphq

      <Elian> And I do take a kind of perverse pleasure in having an OO assembly language...
        If this is the case, then I would suggest enforcing a coding standard along the lines of never quote a string until just before applying to the database.

        The question is a bit vague. Where are these strings coming from (database? user? other code?)? Maybe the O.P. should ask themself: what is the contract of the code that I am writing?.

        It sounds like the current contract is something like: put stuff in the database that may or may not be ready to go in the database. Since there is no way to tell if something is ready or not, the contract cannot be fulfilled.

        ...And you can't tell if something is prepared or not. If I hand you the string 'hello', do I want you to insert 'hello' or do I want you to insert '''hello''' ?

        It's true that I do have a degree of control over things here in that all of the software code is written by my hand and my hand only - and that the input comes almost entirely from form-input by users.

        Part of my intention here, however, was to decide whether I would have to force myself to always send quoted or non-quoted variables to my functions and stick with that practice, or if I could very easily write my functions so that they could say "oh, this is already quoted so I'll use it as-is" and "this one isn't quoted, so I'll quote() first before applying it into the SQL query".

        Also, doing some quote testing would probably suck as I allow HTML in the input (it's an auction site).

        Since I was finally able to refine my abstraction layer to something very simple when it comes to the INSERT and UPDATES, I've decided to stick with the practice of _NEVER_ quote()'ing anything. I'll just pass everything to my Auction::DB class and let it do the work (I know, I should have done this in the first place, but I was looking for a solution outside of putting it in my DB class since I wasn't quite ready to roll it out yet).


        Here is the solution I have used:

        I have an sqlInsert() method in my Auction::DB class:

        sub sqlInsert { my($self, $table, $data) = @_; my $names = join(", ", map { /^[_|-](.*)/; } keys %{$data}); my $pholders = join(', ', ('?') x keys %{$data}); $self->_sqlExecute("INSERT INTO $table ($names) VALUES($pholders)", +values %{$data}); } sub _sqlExecute { my ($self, $sql, @values) = @_; $self->_sqlConnect(); my $retval; eval { my $sth = $self->{_dbh}->prepare($sql); $retval = $sth->execute(@values); }; if ($@) {croak ($@ . "\nSQL: $sql\n" . Dumper(\@_));} return $retval; }


        Then I call it like this (by the way, I use PostgreSQL but this should work with most DBs):

        sub createNewAccount { my ($self, $fields) = @_; # Verify required fields. $self->_validateUsername($fields->{_USERNAME}); $self->_validateEmailAddress($fields->{_EMAIL_ADDRESS}); # Populate other required fields. $fields->{_PASSWORD} = $self->_makePassword(); $fields->{_CREATION_DATE} = 'now()'; } $self->sqlInsert('user_accounts',$fields); $self->sqlCommit; }


        Now, I never need to concern myself with quoting again. And yes, I know I should have just done this in the first place, but I was looking to tidy up my existing code separately from my transition to a mod_perl/OO based re-write of the site (which I am actually in the process of doing right now).

        I also understand that Perl is a horrible first language and it's a really bad idea to make Perl your first introduction to object oriented coding. However, I started my software in Perl five years ago, that's what it's in now, it's what I know the best (so far) and it's more important to maintain/improve the software and the site as it is now than to rewrite a 16,000+ line program in an entirely different (and new to me) language.

        Fortunately, I've had Perlmonks to refer to as a lot of the pitfalls and questions I've encountered during this transition to OO have already been asked by others. :)
Re: Test if string is already quote()'d?
by crackotter (Beadle) on Jun 15, 2003 at 01:12 UTC
    I am kinda new to perl but couldn't you use regular experssions? kinda like this -
    if ($string =~ /\"/) { Quoted } else { Unquoted }
    of course you would change the regular expersion to only match strings that BEGIN and END with quotes. Hope that helps

      As adrianh has already noted, you can't say for sure if 'hello' is a quoted string or an unquoted one waiting for its quotes to be escaped.

      A better regexp would be

      print "quoted" if $string =~ /^["'].*\1$/;

      But then again, it won't solve the problem of checking for escaped quotes within the string.

      Take 'O'Reilly', for example. It will pass the quoted test, but it will have a unescaped quote inside.

        But then again, it won't solve the problem of checking for escaped quotes within the string.

        To me this is the wrong time/place to worry about this. You deal with issues like that when you read your data in. My understanding is that the OP wanted to design an interface that could take quoted variables and do the right thing, or take unquoted variables and do the right thing. As long as the interface defines clearly what constitues a quoted variable, and what the rules it uses to recognize one, then it is the callers responsibility to handle the data appropriately.

        Consider that perhaps on occasion the fact that the quote handling isn't perfect may be a useful workaround for a tight situation.

        However the fact that this behaviour is open to abuse might make me add a safety mechanism to prevent it if I choose, and would certainly result in a note in the documentation stating that it is no more secure than the data it is fed.


        ---
        demerphq

        <Elian> And I do take a kind of perverse pleasure in having an OO assembly language...
        Still better is:
        print "quoted\n" if /^'([^'].*)'$/ and ($1 !~ /[^\\]'/);
        That makes sure the string starts and ends with a quote (as the previous one does), and that there are no unescaped quotes in the string apart from the first and last.

        This makes more assumptions about what your string looks like; it assumes you're using MySQL-style quoting---quote char is single-quote, escape char is backslash. I don't think there's a way around hardcoding your knowledge of your database's quoting into your script; the previous post.

        And you still don't know for sure if 'hello' is a quoted string or an unquoted string containing quotes at the beginning and end; you need some kind of meta-information for that.

        Seems like this could be a useful application of Text::Balanced...

        I don't know if this helps, but anyway:

        #!/usr/bin/perl -w use strict; use Text::Balanced qw( extract_quotelike ); for ( 'O\'Reilly, \'Tim\' "Timmy"', 'q # an octothorpe: \# (not the end of the q!) #', "'SELECT name FROM DUAL;'", ' "You said, \"Use sed\"." ', ) { my ($extracted, $remainder) = extract_quotelike; print "STRING: $_\n"; print " TOOK: $extracted\n"; print " LEFT: $remainder\n"; print " ERROR: $@->{error}\n" if $@; }
        ---
        "A Jedi uses the Force for knowledge and defense, never for attack."
Re: Test if string is already quote()'d?
by aquarium (Curate) on Jun 15, 2003 at 10:50 UTC
    like the regex solutions above..if you don't need any quotes in the database whatsoever (which is a good idea anyway), strip them all, including escaped quotes \\' etc. and then quote all. But, you should realy be told by the source of the text whether it's already quoted or not: my first database theorem: SHIT in = SHIT out.
Re: Test if string is already quote()'d? (REGEX rox)
by gmpassos (Priest) on Jun 16, 2003 at 03:04 UTC
    Well, here's a regex function that I made for that:
    print is_quoted(q`I'm not quoted!`) . "\n" ; print is_quoted(q`"I'm \\"quoted!"`) . "\n" ; print is_quoted(q`"I'm not quoted!\"`) . "\n" ; print is_quoted(q`"I'm not \\\\"quoted!"`) . "\n" ; print is_quoted(q`""`) . "\n" ; print is_quoted(q`''`) . "\n" ; print is_quoted(q`"x"`) . "\n" ; print is_quoted(q`'x'`) . "\n" ; ############# # IS_QUOTED # ############# sub is_quoted { my $data = shift ; print "<<$data>>\n" ; $data =~ s/\\\\//gs ; ## Ignore \\ if ( $data =~ /^ ## Begin \s* ## Ignore spaces in begin. (?:(?!\\).|) ## Quote without \ before. (?: "[^"\\]?" ## Blank quoted or unique. | "(?:(?:\\")|[^"])+(?!\\)." ## quoted with values without " insid +e, but accept \" | '[^'\\]?' ## Blank quoted or unique. | '(?:(?:\\')|[^'])+(?!\\).' ## quoted with values without ' insid +e, but accept \' ) \s* ## Ignore spaces in end. $/sx ## End. /x => extend space for the regex. ) { return( 1 ) ;} ## return true return 0 ; ## return false ##better: # return undef ; }
    Note that I ignore spaces before and after the quoted values. If you don't want, just remove the \s* from the regex.

    I have added the print() inside the function just to follow the tests. Here's the output:

    <<I'm not quoted!>> 0 <<"I'm \"quoted!">> 1 <<"I'm not quoted!\">> 0 <<"I'm not \\"quoted!">> 0 <<"">> 1 <<''>> 1 <<"x">> 1 <<'x'>> 1

    Enjoy!

    Graciliano M. P.
    "The creativity is the expression of the liberty".

Re: Test if string is already quote()'d?
by mce (Curate) on Jun 16, 2003 at 07:32 UTC
    Hi,

    AFAIK DBI's quote will untaint your string. So you can look at the Taint module. You can use the taint function to before you quote, and the is_tainted to check whether it is quoted.

    Anyway, this is not really a nice solution, but it might work.

    I hope this helps
    ---------------------------
    Dr. Mark Ceulemans
    Senior Consultant
    BMC, Belgium