perlmeditation
gmax
<p> The [http://search.cpan.org/perldoc?DBI#fetchall_arrayref|DBI docs] say that fetchall_arrayref can get records in batches, as a compromise between fetching one-by-one and all-at-once. </p>
<p>The example from the manual goes like this.</p>
<code>
my $rows = []; # cache for batches of rows
while( my $row = ( shift(@$rows) || # get row from cache, or reload cache:
shift(@{$rows=$sth->fetchall_arrayref(undef,10_000)||[]}) )
) {
...
}
</code>
<p>This idiom makes sense, because it can fetch records at high speed, as shown in [id://273952].</p>
<p>A few days ago, brother [bradcathey] asked my help for a case that was similar to the idiom in the manual, with the addition of a slice, so that fetchall_arrayref produces a AoH instead of a AoA. This case led to an error, as shown in the following code.</p>
<readmore>
<code>
#!/usr/bin/perl
use strict;
use warnings;
use Data::Dumper;
use DBI;
my $driver = shift || 'mysql';
my $db = 'test';
my $user = undef;
my $pass = undef;
my $dbh;
$driver = lc $driver;
if ($driver eq 'mysql' ) {
$dbh = DBI->connect('DBI:mysql:'.$db
. ";mysql_read_default_file=$ENV{HOME}/.my.cnf"
, $user, $pass, {RaiseError => 1})
or die "Can't connect: $DBI::errstr\n";
printf "DBI: %s - DBD::mysql: %s\n", $DBI::VERSION, $DBD::mysql::VERSION;
}
elsif ($driver eq 'sqlite') {
$db = 'test.db';
if ( -f $db) {
unlink $db;
}
$dbh = DBI->connect('DBI:SQLite:'.$db, $user, $pass, {RaiseError => 1})
or die "Can't connect: $DBI::errstr\n";
printf "DBI: %s - DBD::SQLite: %s\n", $DBI::VERSION, $DBD::SQLite::VERSION;
}
else {
die "driver $driver not supported in this test\n";
}
my $max_fields = 3;
my $field_size = 9;
my $max_records = 10;
my $max_commit = 5;
my $text = 'abcdefghi';
my $inserted = 0;
my $create_query = qq{CREATE TABLE testdbi ( }
. (join q{,}, map { qq{id$_ char($field_size) not null} } ( 1..$max_fields ))
. qq{, primary key (id1))};
{
local $dbh->{PrintError} = 0;
eval { $dbh->do(qq{DROP TABLE testdbi}) };
}
$dbh->do(qq{begin});
$dbh->do($create_query);
my $sth = $dbh->prepare(qq{INSERT INTO testdbi VALUES (}
. (join q{,}, map {q{?}} (1..$max_fields))
.q{)} );
for (1..$max_records) {
$inserted += $sth->execute( map {$text++} (1..$max_fields) );
if (($inserted % $max_commit) == 0) {
$dbh->do(qq{commit});
print "$inserted\n";
$dbh->do(qq{begin});
}
}
$dbh->do(qq{commit});
print "inserted $inserted records\n";
my $max_rows = 3; #must leave a remainder for the error
#multiples of 2 and 5 work
print "without slice\n";
my $rowcache = [];
my $select_query = qq{SELECT id1, id2 FROM testdbi ORDER BY id1};
$sth = $dbh->prepare($select_query);
$sth->execute;
my $count = 0;
while (my $aref = ( shift(@{ $rowcache } )
|| shift (@{$rowcache=$sth->fetchall_arrayref(undef, $max_rows)
|| [] } ) ) )
{
my $rec = Data::Dumper->Dump([$aref],['rec']);
$rec =~ s/\s+/ /g;
$count++;
print "$count $rec \n";
};
#---- HERE STARTS THE FAILING CODE ------
print "with slice\n";
$rowcache = [];
$sth = $dbh->prepare($select_query);
$sth->execute;
$count = 0;
while (my $aref = ( shift(@{ $rowcache })
|| shift (@{$rowcache=$sth->fetchall_arrayref({}, $max_rows)
|| [] } ) ) )
{
my $rec = Data::Dumper->Dump([$aref],['rec']);
$rec =~ s/\s+/ /g;
$count++;
print "$count $rec \n";
};
__END__
$ perl test_dbi.pl mysql
DBI: 1.49 - DBD::mysql: 3.0002
5
10
inserted 10 records
without slice
1 $rec = [ 'abcdefghi', 'abcdefghj' ];
2 $rec = [ 'abcdefghl', 'abcdefghm' ];
3 $rec = [ 'abcdefgho', 'abcdefghp' ];
4 $rec = [ 'abcdefghr', 'abcdefghs' ];
5 $rec = [ 'abcdefghu', 'abcdefghv' ];
6 $rec = [ 'abcdefghx', 'abcdefghy' ];
7 $rec = [ 'abcdefgia', 'abcdefgib' ];
8 $rec = [ 'abcdefgid', 'abcdefgie' ];
9 $rec = [ 'abcdefgig', 'abcdefgih' ];
10 $rec = [ 'abcdefgij', 'abcdefgik' ];
with slice
1 $rec = { 'id1' => 'abcdefghi', 'id2' => 'abcdefghj' };
2 $rec = { 'id1' => 'abcdefghl', 'id2' => 'abcdefghm' };
3 $rec = { 'id1' => 'abcdefgho', 'id2' => 'abcdefghp' };
4 $rec = { 'id1' => 'abcdefghr', 'id2' => 'abcdefghs' };
5 $rec = { 'id1' => 'abcdefghu', 'id2' => 'abcdefghv' };
6 $rec = { 'id1' => 'abcdefghx', 'id2' => 'abcdefghy' };
7 $rec = { 'id1' => 'abcdefgia', 'id2' => 'abcdefgib' };
8 $rec = { 'id1' => 'abcdefgid', 'id2' => 'abcdefgie' };
9 $rec = { 'id1' => 'abcdefgig', 'id2' => 'abcdefgih' };
10 $rec = { 'id1' => 'abcdefgij', 'id2' => 'abcdefgik' };
DBD::mysql::st fetchall_arrayref failed: fetch() without execute() at test_dbi.pl line 106.
DBD::mysql::st fetchall_arrayref failed: fetch() without execute() at test_dbi.pl line 106.
$ perl test_dbi.pl sqlite
DBI: 1.49 - DBD::SQLite: 1.09
5
10
inserted 10 records
without slice
1 $rec = [ 'abcdefghi', 'abcdefghj' ];
2 $rec = [ 'abcdefghl', 'abcdefghm' ];
3 $rec = [ 'abcdefgho', 'abcdefghp' ];
4 $rec = [ 'abcdefghr', 'abcdefghs' ];
5 $rec = [ 'abcdefghu', 'abcdefghv' ];
6 $rec = [ 'abcdefghx', 'abcdefghy' ];
7 $rec = [ 'abcdefgia', 'abcdefgib' ];
8 $rec = [ 'abcdefgid', 'abcdefgie' ];
9 $rec = [ 'abcdefgig', 'abcdefgih' ];
10 $rec = [ 'abcdefgij', 'abcdefgik' ];
with slice
1 $rec = { 'id1' => 'abcdefghi', 'id2' => 'abcdefghj' };
2 $rec = { 'id1' => 'abcdefghl', 'id2' => 'abcdefghm' };
3 $rec = { 'id1' => 'abcdefgho', 'id2' => 'abcdefghp' };
4 $rec = { 'id1' => 'abcdefghr', 'id2' => 'abcdefghs' };
5 $rec = { 'id1' => 'abcdefghu', 'id2' => 'abcdefghv' };
6 $rec = { 'id1' => 'abcdefghx', 'id2' => 'abcdefghy' };
7 $rec = { 'id1' => 'abcdefgia', 'id2' => 'abcdefgib' };
8 $rec = { 'id1' => 'abcdefgid', 'id2' => 'abcdefgie' };
9 $rec = { 'id1' => 'abcdefgig', 'id2' => 'abcdefgih' };
10 $rec = { 'id1' => 'abcdefgij', 'id2' => 'abcdefgik' };
</code>
<p>As you can see from the sample output, the example fails when using both a slice and a row count with [CPAN://DBD::mysql], but it does not fail with [CPAN://DBD::SQLite]. However, changing $max_records to 1,000, the script fails with SQLite2 (segmentation fault).</p>
<p>While initially I was thinking that perhaps there was a bug in DBD::mysql, this latter fact made me think that perhaps using fetchall_arrayref with a slice and row count together was never an intended behavior. Why? Because the row count device is intended for speed, while the slice (leading to a hashref per record) is intended for clarity, but has a speed penalty that does not agree with the row count option.</p>
<p>Thus, I advised [bradcathey] to use the recipe described at [href://?node_id=284436#bindhash|DBI recipes /binding a hash], i.e. using bind_col with a static hash. This way, there can be clarity without suffering too much speed penalty.</p>
<p>So, what do you think? It is a bug, or wrong usage?</p>
</readmore>
<div class="pmsig"><div class="pmsig-127116">
<pre>
_ _ _ _
(_|| | |(_|><
_|
</pre>
</div></div>