Re-indexing a SQL resultset by id

metaperl has asked for the wisdom of the Perl Monks concerning the following question:

Hello, I have a query which returns a set of rows. Each row has an id field:

$VAR1 = [
          {
            'amount' => '58.80',
            'quantity' => '1',
            'product_listid' => '800000FA-1450218051',
            'id' => '3938716373870073',
            'customer_listid' => '8000024C-1450196328',
            'datetime' => '1309966331'
          },
          {
            'amount' => '1.34',
            'quantity' => '1',
            'product_listid' => '800000FC-1450218057',
            'id' => '3938716373870073',
            'customer_listid' => '8000024C-1450196328',
            'datetime' => '1309966331'
          },
          {
            'amount' => '2.14',
            'quantity' => '1',
            'product_listid' => '8000010C-1450218105',
            'id' => '2446438717782054',
            'customer_listid' => '80000245-1450196318',
            'datetime' => '1310146917'
          },
          {
            'amount' => '1.34',
            'quantity' => '1',
            'product_listid' => '800000FC-1450218057',
            'id' => '2446438717782054',
            'customer_listid' => '80000245-1450196318',
            'datetime' => '1310146917'
          }
        ];
[download]

Now, I want to re-index this flat set of rows by that id field as follows:

[ $id1 => [ { $hashrefs_with_id1 } ... ]
  $id2 => [ { $hashrefs_with_id2 } ... ]
]
[download]

I tried the map_hashes method of my favorite DBI wrapper, DBIx::Simple, but it only returns one hashref for each mapping key:

$VAR1 = {
          '2446438717782054' => {
                                  'amount' => '1.34',
                                  'quantity' => '1',
                                  'product_listid' => '800000FC-145021
+8057',
                                  'customer_listid' => '80000245-14501
+96318',
                                  'datetime' => '1310146917'
                                },
          '3938716373870073' => {
                                  'amount' => '1.34',
                                  'quantity' => '1',
                                  'product_listid' => '800000FC-145021
+8057',
                                  'customer_listid' => '8000024C-14501
+96328',
                                  'datetime' => '1309966331'
                                }
        };
[download]

and I want an arrayref of them all. It's easy enough for me to write something to do this:

my @row = $r->hashes; # produce flat format of arrays of hashrefs
my %row;
for my $row (@row) {
  push @{$row{$row->{id}}}, $row;
}
warn Dumper(\%row);
[download]

but someone has to have needed this and written it already... maybe there's a data structure utility that does this sort of reworking of an array? UPDATEDBIx::SQLCrosstab does exactly what I want, except that it aggregates the non-keyed columns instead of making an arrayref of hashrefs out of them. For now, I will just code it.

The mantra of every experienced web application developer is the same: thou shalt separate business logic from display. Ironically, almost all template engines allow violation of this separation principle, which is the very impetus for HTML template engine development.

-- Terence Parr, "Enforcing Strict Model View Separation in Template Engines"

Comment on Re-indexing a SQL resultset by id Select or Download Code

Replies are listed 'Best First'.
Re: Re-indexing a SQL resultset by id by Juerd (Abbot) on Jul 09, 2011 at 22:28 UTC
It seems to me that your database design (or the name of the 'id' column) is broken if you have non-unique ID's. As for supporting this in the module: I'm not inclined to support something in DBIx::Simple that makes the value type returned dependent on the contents of the result set. That results in rather fragile code. You could just add a method, though: `# untested package DBIx::Simple::Result::Metaperl; our @ISA = qw(DBIx::Simple::Result); sub map_arrays_of_hashes { my ($self, $column_name) = @_; my %arrays; for my $row ($self->hashes) { push @{ $arrays{$row->{id}} }, $row; } return wantarray ? %arrays : \%arrays; } ... $db->result_class = 'DBIx::Simple::Result::Metaperl'; my %arrays_of_hashes = $db->query(...)->map_arrays_of_hashes('id');` [download] To keep DBIx::Simple simple, I'm not adding a proliferation of variants; a vast number map_foos_of_bars permutations could be thought of and I'm sure there's they could all be useful in someone's code somewhere, but let's draw a line... :) Juerd	[reply] [d/l]
Re^2: Re-indexing a SQL resultset by id by metaperl (Curate) on Jul 09, 2011 at 23:36 UTC
It seems to me that your database design (or the name of the 'id' column) is broken if you have non-unique ID's. FALSE: `CREATE TABLE students id PRIMARY KEY SERIAL firstname TEXT lastname TEXT ; CREATE TABLE classes id PRIMARY KEY SERIAL name text ; CREATE TABLE student_classes studentid INT classid INT ;` [download] and now: `SELECT * FROM student_classes;` [download] Will you get several instances of each id? Yes? Is the database design un-normalized? No. Conclusion. There are reasonable queries leading to multiple instances of certain fields by which one might want to collate results. In this case, I might want a hash keyed by student id and an array ref of all results with that student id. Call it "collation". The mantra of every experienced web application developer is the same: thou shalt separate business logic from display. Ironically, almost all template engines allow violation of this separation principle, which is the very impetus for HTML template engine development. -- Terence Parr, "Enforcing Strict Model View Separation in Template Engines"	[reply] [d/l] [select]
Re^3: Re-indexing a SQL resultset by id by Juerd (Abbot) on Jul 09, 2011 at 23:42 UTC
SELECT * FROM student_classes; Will you get several instances of each id? Yes? Yes, but these fields are not called 'id'; here, it's just a suffix. A join, however, could result in multiple rows of the result set with the same id, even in a nice database design, so I do retract my observation about the database perhaps not being well-designed. Juerd	[reply]
Re: Re-indexing a SQL resultset by id by sundialsvc4 (Abbot) on Jul 08, 2011 at 22:17 UTC
If you could arrange things so that the query specifies `ORDER BY id`, then your logic here would become very simple. Every occurrence of any given `id` value would become, “by definition, adjacent,” and your logic could simply remember what was the value from the preceding row and notice if the value had changed. Perl’s “auto-vivification” features can come in very handy here. If we have a hashref `$foo` which is to contain an arrayref of values found for each key, we can write very “streamlined” logic such as the following: `while (my $row = $query->fetchrow_hashref) { my $key = $$row{'key'}; # PEDANTIC FOR CLARITY ... my $val = $$row{'value'}; push @{ $$foo{$key} }, $val; }` [download] (caution... extemporaneous code ... Your Mileage May Vary™) The “clever time-saving goodness” here being that, if `$foo` does not yet contain a hash-bucket for a given `$key`, then ... lo! ... magically, it does! And, it is understood to be an (empty) arrayref! And, the desired value magically gets pushed onto it!! (How convenient can you get?) (In the example above, `$$foo{'bletch'}` is simply a shorthand for: `$foo->{'bletch'}` .) If you combine this trick with the previously mentioned `ORDER BY`, the process actually becomes efficient, because the `$key` values will be arriving in an entirely predictable sequence. Page-faults will therefore be minimized. Even (especially...!) if you have millions of rows in your SQL table, this will make a very noticeable difference in speed.


Perl: the Markov chain saw
	PerlMonks