Beefy Boxes and Bandwidth Generously Provided by pair Networks
Keep It Simple, Stupid

Re: Why a reference, not a hash/array?

by dynamo (Chaplain)
on Jul 07, 2007 at 01:15 UTC ( [id://625376] : note . print w/replies, xml ) Need Help??

in reply to Why a reference, not a hash/array?

There's a definite advantage - speed. If you use this version: %hash = $sth->fetchrow_hash(), Perl will have to fetch the whole row and stuff it into the hash. If the row happens to have 100 cols and 50 of them have 1meg of data, it will (usually) be much slower than grabbing a ref and getting only what is needed when it is needed.

Your bonus question is directly related. To implement fetchall_array(), it'd have to slurp the entire table just to stuff it into your @array.

Not terribly efficient. If you really want to do that, they make you write it yourself. One fairly easy way would be to copy the deref'd array ref into a new array:

@data = @{$sth->fetchall_arrayref()};

Replies are listed 'Best First'.
Re^2: Why a reference, not a hash/array?
by Cody Pendant (Prior) on Jul 07, 2007 at 01:26 UTC
    That makes sense, except that I now realise that somewhere in my mental model I don't understand where the data is that we're referencing.

    When I do a "select * from foo where bar" using DBI, I first do something like $sth->execute() and then I do my while $sth->fetchrow_hashref(){}.

    I guess I've never understood what those two steps mean. The first one gets all the matching data from the database into some temporary/intermediate place, and then the row code parses through it one row at a time? So, if I never need to use that pesky column with the 50KB of data in it, it never makes it into memory? So, where is it?

    Nobody says perl looks like line-noise any more
    kids today don't know what line-noise IS ...
      My understanding on this isn't the most solid, but I believe it depends on the DB itself and the optimization of the DBD driver.

      So, for example, you call execute() and DBI talks to the DB via an API to fetch the data into memory - the DB software's memory. When your perl app needs that data, you would call something like fetchrow_hashref(), which then makes the API call to retrieve each successive row's data from the DB to perl.

      So, the short answer is: When you use DBI like the docs recommend, the data is where you need it when you want it, without cluttering up the heap. (keeping your RAM usage lower)
      Essentially, you're worried about details you don't need to worry about. At a guess, when you do an "execute", the information isn't loaded into perl's memory yet, but rather will (usually) be sitting in a buffer inside the database itself. Once you do a "fetchall" of some sort, then the entire result set is definitely loaded into perl's memory, and for efficiency's sake what you want is to get a reference to that datastructure, you don't want to make a second copy of it just to save you from thinking about references.

      You understand, that when you do a:

      $working_href = $returned_href;
      that's just making a copy of a scalar value, but if you do a:
      %working = %returned;
      that makes a copy of the entire data-structure. That doubles you memory usage, and wastes time in making the copy... so you don't want to do that unless you have a really good reason (like you're planning on modifying the copy and you need to preserve the original).

      By the way, if you need to join an array given a reference to it, you just do this:

      my $string = join " ", @{ $aref };
      There's no "extra step" you need to do.

      The prepare/execute/fetch model is there because you will often want to prepare a statement once using placeholders, and then execute it several times using different data each time. (This is faster, and arguably more secure since it helps separate code from data)

      my $sth = $dbh->prepare("insert into t (foo,bar) values (?,?)"); $sth->execute("foo", 1); $sth->execute("bar", 2); $sth->execute("baz", 3);
      Also, I tend to use the following loop a lot when fetching return data; it allows me to peek at the data and make certain changes that wouldn't have been efficient or possible on the DB server:
      while (my $record = $sth->fetchrow_hashref) { # Manipulate record as needed $record->{'foo'} = &bar( $record->{'baz'} ); push @records, $record; }
      And voilá, I have an array of hashrefs. Very convenient.
Re^2: Why a reference, not a hash/array?
by Cody Pendant (Prior) on Jul 07, 2007 at 01:36 UTC
    Also this:

    @data = @{$sth->fetchall_arrayref()};

    Doesn't DWIW! If I do this:

    @data = @{$sth->fetchall_arrayref()}; print Dumper(\@data)
    I get this:
    $VAR1 = [ [ 'foo' ], [ 'bar' ], [ 'baz' ] ];
    that is, I've got an AoA -- I just want ... an A!

    Nobody says perl looks like line-noise any more
    kids today don't know what line-noise IS ...
      The "fetch*" methods family mostly about returning a row, one at a time. The "*array_ref" indicates that a row will be returned as an array reference, so virtually you get ['value', 'for', 'each', 'column'] instead of ('value', 'for', 'each', 'column'). The "all" part indicates that all that array references will be returned, representing all results match the query. And the whole is bundled in another array reference. So you end up with AoA:
      $all_rows = [ [R1C1, R1C2, ..., R1Cn], [R2C1, R2C2, ..., R2Cn], ..., [RnC1, RnC2, ..., RnCn], ];
      Doesn't DWIW! If I do this:

      that is, I've got an AoA -- I just want ... an A!

      Well, AFAIK, Perl is more DWIM rather than DWIW. You may W an A, but by using fetchall_arrayref, you really M AoA. If you really M an A and you only want a single column from each row, then you can use selectcol_arrayref instead.
      # Table: group # Columns: id, name # Values: 1, Admin # 2, User # 3, Public my $sql = 'select * from group'; my $r = $dbh->selectcol_arrayref($sql); # default to first column # or, if you want the the 'name' column my @r = @{$dbh->selectcol_arrayref($sql, {Columns=>[2]})}; # we want t +he 2nd column # if you know the column name you want in advanced: my $sql2 = 'select name from group'; my @group_names = @{$dbh->selectcol_arrayref($sql2)}; print Data::Dumper->Dump( [$r, \@r, \@group_names], [qw(id name group_names)], );
      Output with Data::Dumper:
      $id = [ '1', '2', '3', ]; $name = [ 'Admin', 'User', 'Public', ]; $group_names = [ 'Admin', 'User', 'Public', ];

      Update: (09-07-2007) Fixed typo, corrected syntax to Data::Dumper->Dump() (was Data::Dumper(...)). I know I should just copy/paste instead of retyping. *sighs*

      Open source softwares? Share and enjoy. Make profit from them if you can. Yet, share and enjoy!

        naikonta, you definitely do get extra points for pointing me to that module and that method, neither of which I knew about. Very helpful, and effectively pointing out that I had something of an X-Y problem.

        I do indeed need to select a single column into an array, and of course selectall methods should return AoAs.

        Nobody says perl looks like line-noise any more
        kids today don't know what line-noise IS ...

      fetchall_arrayref fetches rows and each row may contain many fields, hence the AoA structure.

      DWIM is Perl's answer to Gödel
      my @data = map { $_->[0] } @{ $sth->fetchall_arrayref };
      Something like this:
      use DBIx::Simple; my @names = $db->query('SELECT name FROM people WHERE id > 5')->flat;
      fetches everything into "one big flattened list."