Beefy Boxes and Bandwidth Generously Provided by pair Networks
laziness, impatience, and hubris

Re: DBI speed up needed on MySQL

by thor (Priest)
on May 08, 2005 at 03:51 UTC ( #454959=note: print w/replies, xml ) Need Help??

in reply to DBI speed up needed on MySQL

Do you really need all 200 columns when you select? If not, you could select only those that you do need and save yourself some bandwidth. Also, I don't know if it's faster than fetchrow_hashref, but I like to use fetch like so when I need access to the columns by name
my @fields = qw(foo bar baz); my %results; $sth->execute(); $sth->bind_columns(@results{@fields}); while( $sth->fetch() ) { #your results will be available in %results }
Maybe I should benchmark it...


Feel the white light, the light within
Be your own disciple, fan the sparks of will
For all of us waiting, your kingdom will come

Replies are listed 'Best First'.
Re^2: DBI speed up needed on MySQL
by jacques (Priest) on May 08, 2005 at 04:13 UTC
    Do you really need all 200 columns when you select?

    Yes, I do. :(

      Well, then you'll probably want to get the list of column names via a separate query, store those in an array (e.g. "@colnames", then use the array to define the data query and then load the hash:
      my $sqlstr = "select " . join(",", @colnames) . " from sometable ..."; my $sth = $dbh->prepare( $sqlstr ); while ( $rowref = $sth->fetchrow_arrayref ) { @rowhash{@colnames} = @$rowref; ... }
      (untested, but it should give you a good-enough idea...; um, especially after the update to use "@$rowref" in the hash slice assignment instead of "$$rowref" which was wrong.)

      Since you're using mysql, you could get the list of column names by a query like "describe sometable" -- the column name will be the first column in each returned row. (There are other ways, but this is pretty simple.)

        See also the rather amazing "let's use hashes, but magically with bind" example in the DBI docs, in section "DBI STATEMENT HANDLE OBJECTS" "Statement Handle Methods" under 'bind_columns'. This uses the (most?) efficient way to populate a hash with the next record of data.
        $sth->execute; my %row; $sth->bind_columns( \( @row{ @{$sth->{NAME_lc} } } )); while ($sth->fetch) { print "$row{region}: $row{sales}\n"; }

        Also, have you quantified the amount of data you need to paw through, in kilobytes or megabytes? Perhaps the sheer quantity of data you're moving (perhaps through a network connection?) is many many megabytes?

        Well, then you'll probably want to get the list of column names via a separate query, store those in an array (e.g. "@colnames")...
        No you don't. Check out the property NAME of the statement handles in the DBI docs. It fetches and stores those column names into an array, when you execute the query. This property contains an array ref to it.
        my $sth = $dbh->prepare("SELECT * FROM sometable"); my $colnames = $sth->{NAME}; while (my $rowref = $sth->fetchrow_arrayref ) { @rowhash{@$colnames} = @$rowref; ... }

        However, DBI being highly optimized as it is, I don't expect any speed improvement over fetching the hashref.

        Why the need for a hash anyway? It's not ordered. So, keeping the names and data as two arrays, holds more information for you, than just the hash on its own does: the column order. I would believe that for printing, the order is very important. To loop through the arrays side by side, loop through the array index (for either array, they're the same size).

        You can always still map the field names to column numbers, the inverse of the array @$colnames, in a separate hash — and you need to do that only once:

        my %colcolumn; @colcolumn{@$colnames} = 0 .. $#$colnames;

        Stil, I don't expect all this mucking about to perform any miracles. Otherwise, Tim Bunce and co. would really have had to have been messing about. I don't think so.

      Can you give a hint of how you can possibly come up with 200 columns in a design? I can't think of anything in life that has 200 aspects that can't be somehow factored out into some other subaspect hierarchy.

      For example, I recall the word "campaign" earlier in this thread (or maybe hallucinated it). If that's the case, and some of these column names are perhaps the names of each state, then you can turn the table sideways and create 50 separate rows per event in a one-to-many relationship instead of having 50 sets of columns (maybe 4 per thing?) to record the same thing.

      I dunno, but my gut tells me that ending up with 200 columns in an SQL table is a bit like ending up with 200 Perl scalar variables instead of an array. A symptom of that is a lot of variables with similar names differing only by some data-like aspect, like a serial number (1, 2, 3, 4) or a state (WA, OR, CA, ID, ...).

      -- Randal L. Schwartz, Perl hacker
      Be sure to read my standard disclaimer if this is a reply.

      update: As I was chatting with a friend about this problem a bit later, I realized that the problem like this usually results from improperly taking "3 dimensional data" (values depend on 3 inputs) and mapping it to a 2 dimensional table naively: primary key column is one dimension, other columns represent a cross of the other two dimensions. The proper solution is to have two key columns represent two of the three dimensions, and the column names be the third (and hopefully the most diverse and consistent).

        I suspect that the OP has inherited this database, and is providing glue between two parts of a system that he doesn't have control over. 200 columns is probably a DB design problem, but it may be Somebody Else's Problem™. The OP is likely trying to make ends meet before they move the ends again - whether they're in the right place or not.

        The numbers were exaggerated just a little bit. I actually have 139 columns and close to 35,000 rows.

Log In?

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://454959]
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others drinking their drinks and smoking their pipes about the Monastery: (4)
As of 2021-09-20 11:33 GMT
Find Nodes?
    Voting Booth?

    No recent polls found