Beefy Boxes and Bandwidth Generously Provided by pair Networks
"be consistent"
 
PerlMonks  

Data Structure Question

by bohrme (Scribe)
on Nov 25, 2009 at 20:34 UTC ( [id://809426]=perlquestion: print w/replies, xml ) Need Help??

bohrme has asked for the wisdom of the Perl Monks concerning the following question:

I think my brain disappeared temporarily.

I'm trying to figure out the best way to to organize my data so that I can produce output in a more flexible way.

For example, let's say that I have 3 pieces of data: Employee Number, Form Number, Date and I want to be able to produce a summary based on either the Employee Number or Form Number. E.g., For employee 1, list all the form numbers and when they were signed and, likewise, list the employees that have signed a particular form with date signed.

I know that I could stick everything in an sequential array and just offset by three until the third offset counter equals the size of the array but looking at the code is making me ill. Of course, this assumes that every set of data has exactly 3 elements, which is the case here (seems like a poor coding practice to me though).

Unfortunately, I've never really dealt with complex data structures so I'm a little at lost as to how to structure this data. A hash of arrays, hash of hashes, etc.

If that doesn't make sense here's some test data:

Employee Form Date 10001 10 20090101 10002 10 20080515 10003 10 20090323 10001 20 20090412 10002 20 20090711

I'm trying to make the output look something like this:

10001 10 20090101 20 20090412 10002 10 20080515 20 20090711 10003 10 20090323

Or

10 10001 20090101 10002 20090412 10003 20090323 20 10001 20090412 10002 20090711

Hopefully, that makes more sense than my word-based explanation.

Thanks

Replies are listed 'Best First'.
Re: Data Structure Question
by GrandFather (Saint) on Nov 25, 2009 at 21:12 UTC

    Your data could be stored in many ways. The key to choosing a data structure tends to relate to how you most often want to access it and how easy it is to write and maintain reliable code to manage it.

    You could for example use an array of arrays - one entry per record where each record is an array containing three elements. That's good if you want to process all the data every time you perform a query, but is a maintenance nightmare if you ever need to change the number of fields in a record. Even just coding in the first place can be nasty unless you use named constants to access the individual elements in a record.

    You could use an array of hashes which has most of the advantages of the AOA above, but provides named access to the fields in the records making coding and maintenance easier at the cost of needing more memory for storing the data.

    If you need to access the data by some key field then a HOA or HOH is appropriate.

    If you need to access the data by more than one key or there is more data than you really want to fit into memory, then you should use a database. That can actually be a lot simpler than you might think. Consider:

    use strict; use warnings; use DBI; unlink 'db.SQLite'; # Build the database my $dbh = DBI->connect ("dbi:SQLite:dbname=db.SQLite","",""); $dbh->do ('CREATE TABLE employees (employee TEXT, form TEXT, date TEXT +)'); my $sth = $dbh->prepare ('INSERT INTO employees (employee, form, date) + VALUES (?, ?, ?)'); $sth->execute (do {chomp; split}) while <DATA>; print "Access by employee\n"; $sth = $dbh->prepare ( 'SELECT * FROM employees ORDER BY employee, form, date' ); $sth->execute (); my $employee = ''; while (my $row = $sth->fetchrow_hashref ()) { if ($employee ne $row->{employee}) { $employee = $row->{employee}; print "$employee\n"; } printf " %-6s %s\n", @{$row}{qw(form date)}; } print "Access by form\n"; $sth = $dbh->prepare ( 'SELECT * FROM employees ORDER BY form, employee, date' ); $sth->execute (); my $form = ''; while (my $row = $sth->fetchrow_hashref ()) { if ($form ne $row->{form}) { $form = $row->{form}; print "$form\n"; } printf " %-8s %s\n", @{$row}{qw(employee date)}; } __DATA__ 10001 10 20090101 10002 10 20080515 10003 10 20090323 10001 20 20090412 10002 20 20090711

    Prints:

    Access by employee 10001 10 20090101 20 20090412 10002 10 20080515 20 20090711 10003 10 20090323 access by form 10 10001 20090101 10002 20080515 10003 20090323 20 10001 20090412 10002 20090711

    True laziness is hard work
Re: Data Structure Question
by kyle (Abbot) on Nov 25, 2009 at 21:52 UTC

    One way to handle this would be to put your data into a database and query it out. That can be useful especially if you have many many records or you have a data set that grows over time, and you don't want to build it repeatedly.

    I'd probably represent your data with an array of hashes.

    my @records = ( { employee => 10001, form => 10, date => 20090101, }, { employee => 10002, form => 10, date => 20080515, }, { employee => 10003, form => 10, date => 20080323, }, );

    What's nice about this is that each hash can expand to have more fields as necessary. When you want to summarize by any given field, you can do this:

    sub summarize_by { my $field_name = shift @_; my %out; for my $r ( @records ) { push @{ $out{$r->{$field_name}} }, $r; } return \%out; }

    What you'd get from that is a hash of arrays of hashes. Each key of the top level hash is a unique value of the field you specified, and that hash's values are references to an array of records that had that key-value combination.

Re: Data Structure Question
by zwon (Abbot) on Nov 25, 2009 at 21:29 UTC

    You should use a database as GrandFather suggested, but here's example how you could do it using array of hashes:

    use strict; use warnings; use 5.010; use List::MoreUtils qw(uniq); my @data; while (<DATA>) { my %row; @row{qw(employee form date)} = split /\s+/; push @data, \%row; } my @employees = uniq sort map { $_->{employee} } @data; for my $employee (@employees) { say $employee; my @signed = sort { $a->[0] <=> $b->[0] } map { [ $_->{form}, $_->{date} ] } grep { $_->{employee} == $employee } @data; for (@signed) { printf "\t%s\t%s\n", @$_; } } __DATA__ 10001 10 20090101 10002 10 20080515 10003 10 20090323 10001 20 20090412 10002 20 20090711

    Update: kyle suggested more elegant solution for AoH

Re: Data Structure Question
by scorpio17 (Canon) on Nov 25, 2009 at 21:39 UTC
    Here's my version (hash of hash):
    #!/usr/bin/perl use strict; my %data_by_employee; my %data_by_form; # read in tab-delimited data fields into # two hashes-of-hashes (actually, two hashes of hash refs) while(my $line = <DATA>) { chomp $line; my ($employee, $form, $date) = split(/\t/,$line); next unless ($employee && $form && $date); # skips blank lines $data_by_employee{$employee}{$form} = $date; $data_by_form{$form}{$employee} = $date; } print "By Employee:\n"; for my $employee (sort keys %data_by_employee) { print "$employee\n"; # note: $data_by_employee{$employee} is a hash reference, # so we have to dereference it by using %{ } for my $form (sort keys %{ $data_by_employee{$employee} } ) { my $date = $data_by_employee{$employee}{$form}; print "\t$form\t$date\n"; } } print "\n"; print "By Form:\n"; for my $form (sort keys %data_by_form) { print "$form\n"; # note: $data_by_form{$form} is a hash reference, # so we have to dereference it by using %{ } for my $employee (sort keys %{ $data_by_form{$form} } ) { my $date = $data_by_form{$form}{$employee}; print "\t$employee\t$date\n"; } } print "\n"; __DATA__ 10001 10 20090101 10002 10 20080515 10003 10 20090323 10001 20 20090412 10002 20 20090711
Re: Data Structure Question
by bichonfrise74 (Vicar) on Nov 25, 2009 at 22:42 UTC
    Try this.
    #!/usr/bin/perl use strict; use Data::Dumper; my %record; while (<DATA>) { next if ( /^Employee/ ); my ($employee, $form, $date) = split( /\s+/ ); $record{$employee}->{$form} = $date; } print Dumper \%record; __DATA__ Employee Form Date 10001 10 20090101 10002 10 20080515 10003 10 20090323 10001 20 20090412 10002 20 20090711

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://809426]
Approved by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others chilling in the Monastery: (4)
As of 2024-04-19 15:48 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found