http://qs321.pair.com?node_id=1221726


in reply to which data structure do I need for this grouping problem?

Hello,

I see that you have already received good replies. Here is another way to perform this task. The Data::Table module has many useful methods for manipulating tabular data. In this case, the group method is applicable.

The data.tsv file contains the following tab-delimited data

nick 20/5/1950 one john 18/2/1980 two nick 19/6/1978 three nick 20/5/1950 four nick 12/9/2000 five john 15/6/1997 six nick 20/5/1950 seven
This code will group the data, and prepare the concatenated values.
#!/usr/bin/env perl use strict; use warnings; use Data::Table; # Load input data from tsv file # The first argument is the file name # The second argument specifies that there is no header row (in this + case # the Data::Table object that is created will have auto-generated co +lumn # names of col1, col2, etc. my $dt = Data::Table::fromTSV('data.tsv', 0); print "The input table is:\n"; print $dt->tsv, "\n\n"; # Group by 'col1' and 'col2' my $output_t = $dt->group( ['col1', 'col2'], # columns to group by ['col3'], # Columns to perform calculation on [ \&join_vals ], # Apply join_vals function to values found in 'co +l3' ['values'] # Put the joined values into these columns ); print "The output table is:\n"; print $output_t->tsv, "\n\n"; sub join_vals { my @data = @_; return join("|", @data); } exit;
The output should be,
The input table is: col1 col2 col3 nick 20/5/1950 one john 18/2/1980 two nick 19/6/1978 three nick 20/5/1950 four nick 12/9/2000 five john 15/6/1997 six nick 20/5/1950 seven The output table is: col1 col2 values nick 20/5/1950 one|four|seven john 18/2/1980 two nick 19/6/1978 three nick 12/9/2000 five john 15/6/1997 six