Quickest way to insert many rows with DBI

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Monks, I'm looking for ways to speed up a subroutine that involves the insertion of many (~300,000) records into a PostgreSQL database. At the moment I'm doing a prepare_cached() with placeholders, then running insert() for each record. Is there a more efficient way to do this, possibly based around bind_param_array and execute_array, or should I look elsewhere for my speed gains?

It looks to me like the databasing is the rate limiting step because when I run the program the cpu usage is about 60% while disk IO is going crazy.

I crave the benefit of your experience.

Code follows:

sub insert_nodes_one {
    my $sth = $dbh->prepare_cached("INSERT INTO node (taxid, name, tre
+e_id, parent, children, rank, common_name)
    VALUES (?, ?, '4', '-1', 'replace me', ?, ?);   ")
      or die "Cannot prepare: " . $dbh->errstr();

    foreach (keys %tax_lookup) {
        first_pass($_, $sth, \$n, $count);
    }
}

sub first_pass{
my $key = $_[0];
my $sth = $_[1];
my $n= $_[2];
my $count= $_[3];
my $name = $tax_lookup{$key}{name};
    $name =~ s/'/_/g;
        my $common_name = $tax_lookup{$key}{common_name};
        $common_name = 'none' unless ($common_name);
        $common_name =~ s/'/_/g;

        my $parent_taxid = $tax_lookup{$key}{parent_taxid};
        my $rank         = $tax_lookup{$key}{rank};
        if ($tax_lookup{$key}{rank} eq 'no') {$rank = 'none'}
        my $children_taxid = join ' ', @{$tax_lookup{$key}{children_ta
+xid}};

        unless ($$n % 100) {print "\r($$n / $count)"}

        $sth->execute($key, $name, $rank, $common_name);
        $$n++;
}
[download]

Comment on Quickest way to insert many rows with DBI Download Code

Replies are listed 'Best First'.
Re: Quickest way to insert many rows with DBI by borisz (Canon) on Apr 11, 2006 at 09:41 UTC
It looks that you fill the tables for the first time. Try to use the copy command. Search for COPY Support in DBD::Pg. Boris	[reply]
Re: Quickest way to insert many rows with DBI by holli (Abbot) on Apr 11, 2006 at 09:41 UTC
Postgres supports the copy command, though I'm not sure how to use it via DBI: `Command: COPY Description: Copies data between files and tables Syntax: COPY [ BINARY ] table [ WITH OIDS ] FROM { 'filename' \| stdin } [ [USING] DELIMITERS 'delimiter' ] [ WITH NULL AS 'null string' ] COPY [ BINARY ] table [ WITH OIDS ] TO { 'filename' \| stdout } [ [USING] DELIMITERS 'delimiter' ] [ WITH NULL AS 'null string' ]` [download] holli, /regexed monk/	[reply] [d/l]
Re^2: Quickest way to insert many rows with DBI by Anonymous Monk on Apr 11, 2006 at 09:51 UTC
Are you suggesting I write out the data to a flat file first, then use the copy command to populate the table?	[reply]
Re^3: Quickest way to insert many rows with DBI by rinceWind (Monsignor) on Apr 11, 2006 at 09:57 UTC
I think that's what holli has in mind. You could use a piped open (or open2 or open3) to write to your copy command's stdin. Update: Looks like borisz has a better idea with DBD::Pg's COPY support. -- Oh Lord, won’t you burn me a Knoppix CD ? My friends all rate Windows, I must disagree. Your powers of persuasion will set them all free, So oh Lord, won’t you burn me a Knoppix CD ? (Missquoting Janis Joplin)	[reply]
Re: Quickest way to insert many rows with DBI by merlyn (Sage) on Apr 11, 2006 at 12:20 UTC
I don't see the transaction steps, so forgive me if you just didn't show them. But the common knowledge is that you need to do large inserts inside a transaction, and group that in chunks of 1000 lines or so (don't know where 1000 comes from, but people smarter than me have said this). WIthout that, each insert goes through the "begin transaction, do action, commit transaction" overhead, and it's sloooooooww. Also, if you have indexes, add them after the entire load is complete. No sense spending a lot of work to maintain a index that won't be queried in its current state. Even cooler, because DDL in PostgreSQL can also be made part of a transaction, you can begin the transaction, create the table, do all the data loading, add any index and triggers, and then finally commit. Any other thread watching the database would see nothing until the final commit, so it'll look like the entire table sprang into existence fully formed! Even Oracle can't do that, I'm told. (Yes, you can roll back a table drop or schema change in PostgreSQL. Cool.) -- Randal L. Schwartz, Perl hacker Be sure to read my standard disclaimer if this is a reply.	[reply]
Re: Quickest way to insert many rows with DBI by Anonymous Monk on Apr 11, 2006 at 11:25 UTC
Thanks for all your responses - using `$dbh->do("COPY node(taxid, name, tree_id, parent, children, rank, comm +on_name) FROM STDIN"); $dbh->pg_putline("$_\tasdf\t4\t-1\treplace me\tqwer\tzxcv\n");` [download] proves to be blisteringly fast. Cheers M	[reply] [d/l]
Re: Quickest way to insert many rows with DBI by tweetiepooh (Hermit) on Apr 11, 2006 at 10:19 UTC
Don't know postgres but in Oracle the fastest way (or one of them) is similar to that already suggested. Get a nice clean flat data file and use a loader utility or command. In Oracle you'd switch off indexes and constraints and set buffers etc big enough to get lots of rows in before a commit. After the load apply the contstaints and recreate the indices.	[reply]
Re: Quickest way to insert many rows with DBI by Herkum (Parson) on Apr 11, 2006 at 12:18 UTC
A co-worker of mine was doing the same thing with DB2. What he found was each time he did an insert that it was creating a new cursor for each insert. That slowed the process down. What he ended up doing was turning auto-commit off, and then using DBI transaction control (begin_work() and commit() ) and would 'commit' every 1000 rows. It went from taking 8 hours to 4 hours, so a slight improvement. I don't know if Postgre will act in the same manor but it would be worth a shot...	[reply]
A reply falls below the community's threshold of quality. You may see it by logging in.


Pathologically Eclectic Rubbish Lister
	PerlMonks