Re: Basic Perl trumps DBI? Or my poor DB design?

You came to PerlMonks three weeks ago, and started asking questions like:

Then, one week later, all of a sudden you have become a database and benchmarking expert and started blowing your trumpet (Basic Perl array intersection faster than mysql query join.) until today you releasedthe full wisdom of you findings.

I don't buy it. To me it appears that the thing you have understood better so far (Perl) is better than the thing you haven't fully understood (the database).

If you want to propose a public comparison, you have to publish the test data you were using. If the data is classified and can't be released, then generate some dummy data and release either the generation script or the data itself (perhaps by means of an external link). If you do that, then perople here could do better than guessing what could be behind your reasoning.

As for me, I don't believe your benchmarking for a simple reason: databases are definitely better than Perl at crunching tabular data, provided that you use them correctly.

This is actually a challenge. Show us the data, and then we'll see which method is really the best one.

Comment on Re: Basic Perl trumps DBI? Or my poor DB design?

Replies are listed 'Best First'.

Re: Basic Perl trumps DBI? Or my poor DB design?
by punch_card_don (Curate) on Oct 23, 2004 at 13:34 UTC

As for me, I don't believe your benchmarking for a simple reason: databases are definitely better than Perl at crunching tabular data, provided that you use them correctly.

Darn - and I had so enjoyed the unanimity of the "yes, that's what you should expect" crowd so far.

Anyway - no matter. And no problem with the challenge. I've been using vanilla test data anyway. Here it is:

DB 0

    $num_respondents = 20000;
    $offset = 0;
    for $respondent (1 .. $num_respondents) {
        $sql = "INSERT INTO theTable VALUES ($respondent";
        for $response (1 .. $offset) {
            $sql = $sql.", 0";
        }
        $response = $offset + 1;
        while ($response < $num_responses+1) {
            $sql = $sql.", 1";
            if (($num_responses - $response) > 4) {
                $sql = $sql.", 0, 0, 0, 0";
            }
            else {
                for $response ($response+1 .. $num_responses) {
                    $sql = $sql.", 0";
                }
            }
            $response = $response + 5;
        }
        if ($respondent % 2) {$sql = $sql.", 1, 0";}
        else {$sql = $sql.", 0, 1";}
        $sql = $sql.")";
        $sth = $dbh->prepare($sql) or die("Could not prepare!" . $dbh-
+>errstr);
        print "<p>$sql\n";
        $sth->execute() or die("Could not execute!" . $dbh->errstr);
        if ($offset < 4) {$offset++;}
        else {$offset = 0;}
    }
[download]

respondent_1, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0...
respondent_2, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0...
respondent_3, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0...
[download]

DB 1 & DB 2

    $sql = "INSERT INTO theTable (response, respondent) VALUES (?,?)";
    $sth = $dbh->prepare($sql) or die("Could not prepare!" . $dbh->err
+str);
    $i = 1;
    for $respondent (1 .. 20000) {
        for $r (0 .. 600) {
            $response = $i + ($r * 5);
            $sth->execute($response, $respondent) or die("Could not ex
+ecute!" . $dbh->errstr);
        }
        if ($respondent % 2) {
            $response = 9999;
            $sth->execute($response, $respondent) or die("Could not ex
+ecute!" . $dbh->errstr);
        }
        else {
            $response = 6666;
            $sth->execute($response, $respondent) or die("Could not ex
+ecute!" . $dbh->errstr);
        }
        if ($i < 5) {$i++;}
        else {$i = 1;}
    }
[download]

1, 1
6,1
11, 1
...
2, 2
7, 2
12, 2
...
3, 3
8, 3
13, 3
...
[download]

DB 3

    $respondents_by_5 = 4000;
    $sql = "SHOW TABLES";
    $sth = $dbh->prepare($sql) or die("Could not prepare!" . $dbh->err
+str);
    $sth->execute() or die("Could not execute!" . $dbh->errstr);
    $j = 1;
    while ($table = $sth->fetchrow_array()) {
        if ($table ne 't6000' && $table ne 't9000') {
            $sql = "INSERT INTO ".$table." VALUES (?)";
            $std = $dbh->prepare($sql) or die("Could not prepare! At t
+able = $table because " . $dbh->errstr);
            for $i (0 .. $respondents_by_5) {
                $respondent = $j + ($i * 5);
                $std->execute($respondent) or die("Could not execute!"
+ . $dbh->errstr);
            }
            if ($j < 5) {$j++;}
            else {$j = 1;}
        }
    }
    $sql = "INSERT INTO t9999 VALUES (?)";
    $stf = $dbh->prepare($sql) or die("Could not prepare! At table = $
+table because " . $dbh->errstr);
    $sql = "INSERT INTO t6666 VALUES (?)";
    $stm = $dbh->prepare($sql) or die("Could not prepare! At table = $
+table because " . $dbh->errstr);
    for $respondent (1 .. 20000) {
        if ($respondent % 2) {
            $stm->execute($respondent) or die("Could not execute!" . $
+dbh->errstr);
        }
        else {
            $stf->execute($respondent) or die("Could not execute!" . $
+dbh->errstr);
        }
    }
[download]

Table_1:1, 6, 11, 16, ..., 19996
Table_2:2, 7, 12, 17, ..., 19997
Table_3:3, 8, 13, 18, ..., 19998
Table_4:4, 9, 14, 19, ..., 19999
Table_5:5, 10, 15, 20, .., 20000
Table_6:1, 6, 11, 16, ..., 19996
etc.
[download]

DB 4

    $j = 1;
    for $r (1 .. $num_responses) {
        @lines = ();
        $table = ">".$db_dir."/r".$r;
        for $i (0 .. $respondents_by_5) {
            $respondent = $j + ($i * 5);
            push (@lines, $respondent);
        }
        &write_lines_to_file(1, $table, @lines);
        if ($j < 5) {$j++;}
        else {$j = 1;}
    }
    for $respondent (1 .. 20000) {
        if ($respondent % 2) {
            push(@lines_6666, $respondent);
        }
        else {
            push(@lines_9999, $respondent);
        }
    }
    $table = ">".$db_dir."/r6666";
    &write_lines_to_file(1, $table, @lines_6666);
    $table = ">".$db_dir."/r9999";
    &write_lines_to_file(1, $table, @lines_9999);
[download]

sub write_lines_to_file {
    my $add_line_return = shift;
    my $file = shift;
    my @lines=@_;
    open(FILE, ">$file") or dienice("cannot open file : $!");
    $STDOUT_Handle = select(FILE); 
    for my $i (0 .. $#lines) {
        print "$lines[$i]";
        if ($add_line_return) {print "\n";}
    }
       close(FILE);
    select($STDOUT_Handle);
}
[download]

If you can see something that alters my understanding, I'm all ears.

[reply]
[d/l]
[select]


Clear questions and runnable code get the best and fastest answer
	PerlMonks