Beefy Boxes and Bandwidth Generously Provided by pair Networks
There's more than one way to do things
 
PerlMonks  

fastest method to use DBI

by Anonymous Monk
on Jul 07, 2009 at 04:52 UTC ( [id://777737]=perlquestion: print w/replies, xml ) Need Help??

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hi, I am trying ti use DBI for accesing my oracle table which have 100,000 rows.FRom various articles i read fetchall_arrayref with binding () the colums give the fastest fetch .I am trying to fetch one row from database and to an array and first element is pushed as key and rest will be pushed as value which is an anonymous array.
my $sth = $dbn -> prepare("select number,id,start_dat,end_dat from SUB +SCRIBERSLIST"); $sth -> execute(); $sth->bind_columns(\$number, \$id,\$start_dat,\$end_dat); my $max_row=5000; my $rowcache; my %hash =(); while (my $aref = shift(@$rowcache) || shift ( @{$rowcache= $sth->fetchall_arrayref(undef, $max_rows)} ) ) { # $aref now contains only one row push @{$hash{$aref[0]}}, [ $aref[1],$aref[2],$aref[3] ]; }; $sth->finish(); $dbn->disconnect;
is it the right way to get the fastest fetch?

Replies are listed 'Best First'.
Re: fastest method to use DBI
by BrowserUk (Patriarch) on Jul 07, 2009 at 07:03 UTC

    The whole point of a SELECT statement is to select (only) the records you need!

    If you truly cannot do with anything less that the whole 100,000 rows, and if you need to reload them often enough that the load performance is a problem, then you would be better storing them (or cacheing) them somewhere that does not carry the overhead of the DB select and communications--eg. a file.


    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
Re: fastest method to use DBI
by moritz (Cardinal) on Jul 07, 2009 at 07:26 UTC
    I think that a fetchall_arrayref is very memory intensive and not a good idea.

    I guess that DBI or the underlying drives either batch operations automatically, or have options to do so, or don't do it because it's not efficient. Either way I don't think that your "manual" caching will do very much good.

    Also you're using ->bind_columns(\$number, \$id,\$start_dat,\$end_dat); which means that after each ->fetch() the variables $number, $id etc have the new values. This is efficient, but only if you use ->fetch(), not ->fetchall_arrayref (which duplicates the work).

Re: fastest method to use DBI
by targetsmart (Curate) on Jul 07, 2009 at 05:01 UTC
      Hi, It would be great help if get an expert comment on the code. Thanks
        is it the right way to get the fastest fetch?
        I don't know, you have run it and give some statistics on the time taken, memory consumption of the perl program, etc.

        It would be great help if get an expert comment on the code
        I am not a database expert, but IMHO using the database query's OFFSET AND LIMIT is the best way to fetch huge number of records, it is will take only minimum memory(depending on the offset and limit values), and effective values for limit will enable you to fetch all records in a optimum number of fetch cycles.


        Vivek
        -- 'I' am not the body, 'I' am the 'soul', which has no beginning or no end, no attachment or no aversion, nothing to attain or lose.
Re: fastest method to use DBI
by JavaFan (Canon) on Jul 07, 2009 at 09:54 UTC
    Without knowing which driver you are using, most certainly not. The way you are doing it duplicates data - you are using bind variables and you are using fetchall_arrayref. Furthermore, you first fetch every row in a big table, then you loop over it (in a, IMO, weird way) and slightly rearrange it. You're doing quite a lot of data duplications, even before you do anything with the data.

    First you have to answer the question: do I need all 100000 rows before I do any processing? Or do you want to process each row? In which case you're (probably) better off fetching a row at the time, doing the processing, then fetching the next. Note I say probably - if the processing takes a long time, you are holding resources (perhaps even locks) in the database which may influence other processes accessing the data.

    AFAIK, bind variables and fetchrow_arrayref are the fastests way to retrieve data - with bind variables probably the fastest (but I haven't benchmarked it myself, and it may vary between drivers). I never use bind variables, as I don't like its action at a distance, but if fetching was the bottleneck of a time critical program, I'd certainly look into it.

      Yes, bind_columns/->fetchrow_arrayref (or its alias, ->fetch) is the fastest way to retrieve data, per the DBI docs:
      [fetchrow_arrayref] Fetches the next row of data and returns a reference to an array holding the field values. Null fields are returned as undef values in the array. This is the fastest way to fetch data, particularly if used with $sth->bind_columns.
      Perhaps the OP misread this as saying that the fastest option was fetchall_arrayref rather than fetchrow_arrayref?

      The recommended technique, then, would be:

      my $sth = $dbn->prepare("select number,id,start_dat,end_dat from SUBSC +RIBERSLIST"); $sth->execute(); $sth->bind_columns(\$number, \$id, \$start_dat, \$end_dat); my %hash =(); while ($sth->fetch) { # ->fetch populates the variables from ->bind_columns push @{$hash{$number}}, [$id, $start_dat, $end_dat]; }; $sth->finish(); $dbn->disconnect;

      But, as already noted, you should also SELECT only the rows you need and do your processing line-by-line instead of sucking in the whole table at once if possible. I've only addressed the mechanics of how the OP is pulling the SELECTed rows.

        I suspect what the OP saw was something about fetchall_arrayref bering the fastest way to fetch all the data, which is true since it doesn't require looping in perl. However, fetchall_arrayref doesn't work with bind_columns. Also, since the OP is only after a single row, there is no advantage to fetchall_arrayref.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://777737]
Approved by targetsmart
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others taking refuge in the Monastery: (6)
As of 2024-04-25 15:28 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found