http://qs321.pair.com?node_id=1000943


in reply to Processing ~1 Trillion records

Could you run this version of your script and report back the output?

# use DBI oracle my $dbh = DBI->connect( "dbi:Oracle:host=server.domain.com; sid=sid; port=1521", "username +", "password" ) or die "Can't connect to database $DBI::errstr\n"; # The SELECT query # This query return ~945,000,000,000 records $sql_chr = qq{ select column_a, g.column_b, column_c, column_d, column_e from table_f f, table_g g where f.column_a = g.column_b and column_c != '-/-' and column_d = 'STR' and column_e = 'STR1' and column_a = 9 }; print time; my $sth_chr = $dbh->prepare( $sql_chr ); print time; if($sth_chr->execute) { my $now = time; print $now; my $stop = $now + 60; my $s=1; while( my ( @dat ) = $sth_chr->fetchrow ) { $s++; last if time() > $stop; } print $s; } print time;

With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.

RIP Neil Armstrong

Replies are listed 'Best First'.
Re^2: Processing ~1 Trillion records
by lo_tech (Scribe) on Oct 25, 2012 at 21:49 UTC
    Off the top of my head:

    1) You're sql query is doing an dynamic hash inner join. You may get better results making sure the joins (as well as your selection criteria) are indexed fields.

    2) You're essentially slurping in db records to markup the fields and dump it to files. If there is any way to can get around reading a trillion records into a hash (i.e., ORDER BY in the database... and the ordered fields are indexed) then you can read/markup/write the records retail without thrashing ram/swap.

    Well, that's my $.02 worth. No, for refunds you'll have to check our customer service department.