Beefy Boxes and Bandwidth Generously Provided by pair Networks
Come for the quick hacks, stay for the epiphanies.
 
PerlMonks  

Re^2: DBI speed up needed on MySQL

by jacques (Priest)
on May 08, 2005 at 04:13 UTC ( [id://454963]=note: print w/replies, xml ) Need Help??


in reply to Re: DBI speed up needed on MySQL
in thread DBI speed up needed on MySQL

Do you really need all 200 columns when you select?

Yes, I do. :(

Replies are listed 'Best First'.
Re^3: DBI speed up needed on MySQL
by graff (Chancellor) on May 08, 2005 at 04:59 UTC
    Well, then you'll probably want to get the list of column names via a separate query, store those in an array (e.g. "@colnames", then use the array to define the data query and then load the hash:
    my $sqlstr = "select " . join(",", @colnames) . " from sometable ..."; my $sth = $dbh->prepare( $sqlstr ); while ( $rowref = $sth->fetchrow_arrayref ) { @rowhash{@colnames} = @$rowref; ... }
    (untested, but it should give you a good-enough idea...; um, especially after the update to use "@$rowref" in the hash slice assignment instead of "$$rowref" which was wrong.)

    Since you're using mysql, you could get the list of column names by a query like "describe sometable" -- the column name will be the first column in each returned row. (There are other ways, but this is pretty simple.)

      See also the rather amazing "let's use hashes, but magically with bind" example in the DBI docs, in section "DBI STATEMENT HANDLE OBJECTS" "Statement Handle Methods" under 'bind_columns'. This uses the (most?) efficient way to populate a hash with the next record of data.
      $sth->execute; my %row; $sth->bind_columns( \( @row{ @{$sth->{NAME_lc} } } )); while ($sth->fetch) { print "$row{region}: $row{sales}\n"; }

      Also, have you quantified the amount of data you need to paw through, in kilobytes or megabytes? Perhaps the sheer quantity of data you're moving (perhaps through a network connection?) is many many megabytes?

      Well, then you'll probably want to get the list of column names via a separate query, store those in an array (e.g. "@colnames")...
      No you don't. Check out the property NAME of the statement handles in the DBI docs. It fetches and stores those column names into an array, when you execute the query. This property contains an array ref to it.
      my $sth = $dbh->prepare("SELECT * FROM sometable"); my $colnames = $sth->{NAME}; while (my $rowref = $sth->fetchrow_arrayref ) { @rowhash{@$colnames} = @$rowref; ... }

      However, DBI being highly optimized as it is, I don't expect any speed improvement over fetching the hashref.

      Why the need for a hash anyway? It's not ordered. So, keeping the names and data as two arrays, holds more information for you, than just the hash on its own does: the column order. I would believe that for printing, the order is very important. To loop through the arrays side by side, loop through the array index (for either array, they're the same size).

      You can always still map the field names to column numbers, the inverse of the array @$colnames, in a separate hash — and you need to do that only once:

      my %colcolumn; @colcolumn{@$colnames} = 0 .. $#$colnames;

      Stil, I don't expect all this mucking about to perform any miracles. Otherwise, Tim Bunce and co. would really have had to have been messing about. I don't think so.

        However, DBI being highly optimized as it is, I don't expect any speed improvement over fetching the hashref.

        It is highly optimized, but the docs point out that using fetchrow_hashref is slower than using other methods.

        Why the need for a hash anyway?

        That's the spec I have been given. I need a hash ref. (Another application uses it.)

Re^3: DBI speed up needed on MySQL
by merlyn (Sage) on May 08, 2005 at 18:17 UTC
    Can you give a hint of how you can possibly come up with 200 columns in a design? I can't think of anything in life that has 200 aspects that can't be somehow factored out into some other subaspect hierarchy.

    For example, I recall the word "campaign" earlier in this thread (or maybe hallucinated it). If that's the case, and some of these column names are perhaps the names of each state, then you can turn the table sideways and create 50 separate rows per event in a one-to-many relationship instead of having 50 sets of columns (maybe 4 per thing?) to record the same thing.

    I dunno, but my gut tells me that ending up with 200 columns in an SQL table is a bit like ending up with 200 Perl scalar variables instead of an array. A symptom of that is a lot of variables with similar names differing only by some data-like aspect, like a serial number (1, 2, 3, 4) or a state (WA, OR, CA, ID, ...).

    -- Randal L. Schwartz, Perl hacker
    Be sure to read my standard disclaimer if this is a reply.


    update: As I was chatting with a friend about this problem a bit later, I realized that the problem like this usually results from improperly taking "3 dimensional data" (values depend on 3 inputs) and mapping it to a 2 dimensional table naively: primary key column is one dimension, other columns represent a cross of the other two dimensions. The proper solution is to have two key columns represent two of the three dimensions, and the column names be the third (and hopefully the most diverse and consistent).

      I suspect that the OP has inherited this database, and is providing glue between two parts of a system that he doesn't have control over. 200 columns is probably a DB design problem, but it may be Somebody Else's Problem™. The OP is likely trying to make ends meet before they move the ends again - whether they're in the right place or not.

      The numbers were exaggerated just a little bit. I actually have 139 columns and close to 35,000 rows.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://454963]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others about the Monastery: (8)
As of 2024-04-19 14:22 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found