Beefy Boxes and Bandwidth Generously Provided by pair Networks
Don't ask to ask, just ask
 
PerlMonks  

Is this a reasonable data structure?

by Theo (Priest)
on Oct 23, 2003 at 20:05 UTC ( [id://301693]=perlquestion: print w/replies, xml ) Need Help??

Theo has asked for the wisdom of the Perl Monks concerning the following question:

Hello again, wise ones.
For the exercise and for the utility, I want to try building simple web pages from data in a flat file. Later, when I understand things better, I will use CGI.pm where appropriate. Different scripts would use differing parts of the data - not all of the data would be needed for a Name/Phone Number list, for example.

The data file would look something like this, pipe separated values (would that be PSV?) with a max of maybe 50 records instead of three. (In real life there would be many more fields, too.)

title|first|last|room|phone|email Mrs|Linda|Caralo|201|148|she@borg.org Miss|Jean|Androno|317|167|j@alo.com Mr|Steve|Paterman|101|100|steve@net.net
My question concerns using a reasonable data structure within the perl scripts. As a novice, the best I have come up with is an array of hashes as follows:
my %caralo_l ( “title” => “Mrs”, “first” => “Linda”, “last” => “Caralo”, “room” => “201”, “phone” => “148”, “email” => “she@borg.org”, ); my %androno_j ( “title” => “Miss”, “first” => “Jean”, “last” => “Andronlo”, “room” => “317”, “phone” => “167”, “email” => “j@alo.com”, ); my %paterman_s ( “title” => “Mr”, “first” => “Steve”, “last” => “Paterman”, “room” => “101”, “phone” => “100”, “email” => “stv@net.net”, ); my @alldata (%caralo_l, %androno_j, %paterman_s);
It seems to me kind of a waste to repeat the KEY information in each hash, but I haven’t thought up a better way.
Question: Is this really an array of hashes?
Question: Is this a reasonable approach?

-theo-
(so many nodes and so little time ... )
Note: All opinions are untested, unless otherwise stated

Replies are listed 'Best First'.
Re: Is this a reasonable data structure?
by sauoq (Abbot) on Oct 23, 2003 at 20:31 UTC
    my @alldata (%caralo_l, %androno_j, %paterman_s);
    ...
    Question: Is this really an array of hashes?

    Well, hardburn almost had it. It's actually just an array. Perl will flatten your hashes. If you wrote it as

    my @alldata = (\%caralo_l, \%androno_j, \%paterman_s);
    you would have an array of hashes.

    It seems to me kind of a waste to repeat the KEY information in each hash, but I haven’t thought up a better way.

    Yes, it is a waste. You might consider just using an array of arrays and then keeping a hash where the keys are your column names and the values are their indexes in the array.

    Edit: Added the missing '=' in the assignment. I didn't notice its absence, at first, after cutting and pasting it from the OP.

    -sauoq
    "My two cents aren't worth a dime.";
    
Re: Is this a reasonable data structure?
by hardburn (Abbot) on Oct 23, 2003 at 20:13 UTC

    Is this really an array of hashes

    No, it's a hash-of-hashes:

    my %data = ( paterman_s => { title => 'Mr', first => 'Steve', last => 'Paterman', room => 101, phone => 100, email => 'stv@net.net', }, # And so on );

    Also, please watch the weird Microsoft quoting characters.

    ----
    I wanted to explore how Perl's closures can be manipulated, and ended up creating an object system by accident.
    -- Schemer

    :(){ :|:&};:

    Note: All code is untested, unless otherwise stated

Re: Is this a reasonable data structure?
by tadman (Prior) on Oct 23, 2003 at 20:32 UTC
    Keep in mind that your assignment is totally broken. Missing equals sign aside, see what this does:
    my @array = (%hash1, %hash2);
    What you end up with is merely a list of the key/value pairs from %hash1 and %hash2, not an array of hashes.

    As hardburn suggested, what you want is a hash of hashes (HoH):
    use warnings; use strict; my @template; my %data; while (<>) { chomp; my @line = split(/\|/, $_); if (!@template) { @template = @line; } else { my $key = lc($line[2]."_".substr($line[1],0,1)); @{$data{$key}}{@template} = @line; } } use Data::Dumper; print Dumper(\%data);
    I really have no idea how you were going to name your hashes like that, so I guessed. Note that you might have to fix the $key definition so that two "B.Smith" people don't collide.
Re: Is this a reasonable data structure?
by etcshadow (Priest) on Oct 23, 2003 at 20:46 UTC
    Bearing in mind (as mentioned above) that these should be arrays of hash-references and not hashes (see docs for perlref and perlreftut)...

    It's a perfectly reasonable representation for the data, depending on how you want to handle the data. In my own work, I very frequently deal with arrays of hasherefs. Granted, though, it's not the only way, or necessarily the best. That's going to depend on what you want to do with it.

    This can be a good way to represent very simple objects or rows of a database table. However, if you want to have a data structure which, in and of itself, enforces the homogeneity of the individual rows/objects, then you can go with an array of array-refs:

    my @data = ( [ 'Mrs', 'Linda', 'Carolo', '201', '148', 'she@borg.org' ], [ 'Mrs', 'Jean', 'Andronlo', '317', '167', 'j@alo.com' ], # etc );
    And, if you like, you can keep the column name => column index map in a hash:
    my %index = ( title => 1, first => 2, last => 3, room => 4, phone => 5, email => 6, );
    and then you can reference the items in a row:
    foreach my $row (@data) { print "$row->[$column{title}] $row->[$column{first}] $row->[$column +{last}]\n"; }
    There's also something in perl called a "psuedo hash" which is a means by which the language does what I showed above (using a name as an index in an array, rather than a number), but I'd avoid them if possible.

    Anyway, that (above) is just one other example of how you might store a table... ultimately the "best" method for storing your data table will be dictated by what you intend to do with it. However, the array of hashrefs is about the simplest, most flexible way (though some people might gripe about the "wasteful"ness (both in space and time) of using a hash-lookup for each element of each row).


    ------------
    :Wq
    Not an editor command: Wq
Re: Is this a reasonable data structure?
by Roger (Parson) on Oct 24, 2003 at 00:34 UTC
    Seems that nobody has pointed this out yet - there is a convenient way to access data stored in your flat file. By using DBI and DBD::CSV modules.

    use strict; use DBI; use DBD::CSV; use Data::Dumper; # Connect to CSV database my $dbh = DBI->connect("DBI:CSV:csv_sep_char=\|") or die "Cannot connect: " . $DBI::errstr; $dbh->{'csv_tables'}->{'addressbook'} = {'file'=>'addressbook.txt' }; # load address book entries my $sth = $dbh->prepare("SELECT * FROM addressbook"); $sth->execute(); # store data in 2-tier hash table my %data; while (my $res = $sth->fetchrow_hashref()) # loop through data { # create hash to store details my %rec = map { $_ => $res->{$_} } @{$sth->{NAME}}; # create top level hash with last name as lookup key $data{$rec{"last"}} = \%rec; } # cleaning up $sth->finish; $dbh->disconnect; # inspect our result print Dumper(\%data);
    The data file -
    addressbook.txt --------------- title|first|last|room|phone|email Mrs|Linda|Caralo|201|148|she@borg.org Miss|Jean|Androno|317|167|j@alo.com Mr|Steve|Paterman|101|100|steve@net.net
    And the hash structure built with the above script:
    $VAR1 = { 'Caralo' => { 'email' => 'she@borg.org', 'first' => 'Linda', 'last' => 'Caralo', 'title' => 'Mrs', 'phone' => '148', 'room' => '201' }, 'Paterman' => { 'email' => 'steve@net.net', 'first' => 'Steve', 'last' => 'Paterman', 'title' => 'Mr', 'phone' => '100', 'room' => '101' }, 'Androno' => { 'email' => 'j@alo.com', 'first' => 'Jean', 'last' => 'Androno', 'title' => 'Miss', 'phone' => '167', 'room' => '317' } };

      Or he could do it in a quarter of the time with half the code, without having to download and compile modules or read 1000 lines of documentation and learn SQL.

      #! perl -slw use strict; use Data::Dumper; my @fields = split'\|', <DATA>; chomp $fields[-1]; my %HoH = map{ chomp; my%h; @h{ @fields } = split'\|'; ( $h{ last } . '_' . substr( $h{ first }, 0, 1 ) => \%h ) } <DATA>; print Dumper \%HoH; __DATA__ title|first|last|room|phone|email Mrs|Linda|Caralo|201|148|she@borg.org Miss|Jean|Androno|317|167|j@alo.com

      prints

      P:\test>test2 P:\test>test2 $VAR1 = { 'Paterman_S' => { 'email' => 'steve@net.net', 'first' => 'Steve', 'last' => 'Paterman', 'title' => 'Mr', 'phone' => '100', 'room' => '101' }, 'Caralo_L' => { 'email' => 'she@borg.org', 'first' => 'Linda', 'last' => 'Caralo', 'title' => 'Mrs', 'phone' => '148', 'room' => '201' }, 'Androno_J' => { 'email' => 'j@alo.com', 'first' => 'Jean', 'last' => 'Androno', 'title' => 'Miss', 'phone' => '167', 'room' => '317' } };

      Examine what is said, not who speaks.
      "Efficiency is intelligent laziness." -David Dunham
      "Think for yourself!" - Abigail
      Hooray!

      DBD::CSV++

      However, you are doing too much work:

      ... # load address book entries my $sth = $dbh->prepare('SELECT * FROM addressbook'); $sth->execute(); my %data = map {$_->{last} => $_} @{$sth->fetchall_arrayref({})}; print Dumper \%data;

      jeffa

      L-LL-L--L-LL-L--L-LL-L--
      -R--R-RR-R--R-RR-R--R-RR
      B--B--B--B--B--B--B--B--
      H---H---H---H---H---H---
      (the triplet paradiddle with high-hat)
      
      Along the same lines, but a little easier if you don't know SQL, you can try using AnyData.

      -Nathan

Re: Is this a reasonable data structure?
by Art_XIV (Hermit) on Oct 23, 2003 at 20:56 UTC

    What you are probably going to want is a hash of hashes, as long as you can count on the values for your top level of hash being unique, and especially if the values in the top level hash being something you will frequently use for lookups.

    In a hash of hashes you'd load and access your data in a pattern similar to:

    $person->{'caralo_l'}{'title'} = "Mrs"; $person->{'androno_j'}{'room'} = "317"; $person->{'paterman_s'}{'last'} = "Paterman"; $person->{'paterman_s'}{'email'} = "stv@net.net";

    The main downside to an array of hashes is that it (the array) will have to be traversed to find specific entries in the hashes. Of course, the top-level hash in a hash of hashes will have to be traversed if the keys aren't being helpful.

    Check out Randall Schwartz's 'Learning Perl Objects, References and Modules' if you haven't done so for it's very lucid discussions of references and data structures.

      If you are going for speed of retrieval, then you could do an LoL and then have a hashtable to lookup indexes. Then you could have hashes that are keyed on firstname, lastname, a combination, or anything else. This can be a rather phone problem. The sick reason i enjoyed the data structures class :-)


      ___________
      Eric Hodges
Re: Is this a reasonable data structure?
by Theo (Priest) on Oct 25, 2003 at 00:48 UTC
    I was expecting a trickle of information, but find I'm overwhelmed by the deluge. I think I understand about 20% of what y'all have suggested. I'll be sitting down with the Llama and Camel as I read through your replies. There is so much to absorb here, understanding it will be a growth experience.
    Thank You all.

    -theo-
    (so many nodes and so little time ... )
    Note: All opinions are untested, unless otherwise noted

Re: Is this a reasonable data structure?
by eric256 (Parson) on Oct 23, 2003 at 20:37 UTC

    You could instead use an array of arrays.

    use strict; use warnings; use Data::Dumper; my $i = 0; my $line = <DATA>; chomp($line); my $colums = { # hash ref to hold colum numbers map { $_ => $i++ } # map each column to a hash an +d give it the index split(/\|/,$line) # split it on the pipe and sen +d it to map }; print Dumper($colums); my $rows; foreach my $line (<DATA>) { chomp $line; push @$rows, [split(/\|/, $line)]; } print "Record 1: first = " . $rows->[0][$colums->{first}]; __DATA__ title|first|last|room|phone|email Mrs|Linda|Caralo|201|148|she@borg.org Miss|Jean|Androno|317|167|j@alo.com Mr|Steve|Paterman|101|100|steve@net.net

    That way you don't reproduce colum information. Prob a better way to do this but here was my whack at it.


    ___________
    Eric Hodges
Re: Is this a reasonable data structure?
by Anonymous Monk on Oct 24, 2003 at 12:25 UTC
    "It seems to me kind of a waste to repeat the KEY information in each hash, but I haven’t thought up a better way."
    But that way it can go right into HTML::Template!

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://301693]
Approved by Paladin
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others browsing the Monastery: (8)
As of 2024-04-16 07:29 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found