http://qs321.pair.com?node_id=590938

artist has asked for the wisdom of the Perl Monks concerning the following question:

I have following structure
$data = { 1990 => [ 1,2,3], 1991 => [2,3,5], 1992 => [1,2,7,9], .... };
Assume that keys are 'years' and lists are id for some items. I like to find local max, ie.. I can select years from 1990..1999 and want to find out, which are the 10 most popular items in that window for given year. I like to have the results in strip fashion for first k items. So output should look like 'year'X 'item' where each entry in table tells me if item is popular in that year or not. Further sorting to represent minimum gaps at the top would be nice too. For example: Years( 1995..1999). The last column represents the counts in given year.
ItemID  1995    1996    1997    1998    1999
20      =       =       =       =       =               5
5               =       =       =       =               4
6       =               =       =       =               4
7       =               =       =       =               4
12      =       =       =               =               4
14      =       =       =       =                       4
4       =       =               =                       3
9       =       =                       =               3
10      =               =       =                       3
16              =       =               =               3
1               =       =                               2
2       =               =                               2
3       =                               =               2
8               =                       =               2
11                              =       =               2
13      =                       =                       2
17              =       =                               2
19              =                       =               2
15                              =                       1
18                              =                       1
This is not home work, and I have already written some code. I like to know, if there are any modules available..to achieve what I am looking for.

Thanks,

--Artist

Replies are listed 'Best First'.
Re: Windows Maximums
by moklevat (Priest) on Dec 20, 2006 at 17:08 UTC
    Hi artist,

    Based on the small example you provided I'm not exactly certain what your data look like, but it seems most likely that you want to generate a frequency distribution of items for each year in the range (i.e. count the number of occurences of each item in a given year). For this, you might want to look at Statistics::Frequency.

    For your output table, I would recommend checking out Perl6::Form.

Re: Windows Maximums
by GrandFather (Saint) on Dec 20, 2006 at 19:13 UTC

    I'm not sure about modules for the data reduction phase, but here are a some hints to head you along the way:

    1. Create a hash by year of lists of item id and count pairs (perlref and perllol may help)
    2. for each year sort the appropriate list by count (see sort for help)
    3. Use an array slice to pick off the top 10 items (see the slices section in perldata)

    DWIM is Perl's answer to Gödel
Re: Windows Maximums
by johngg (Canon) on Dec 20, 2006 at 20:16 UTC
    I am not sure what modules are out there to help you. I expect there are some for formatting tables. However, here is a script that seems to produce the required output. I make another hash to associate item ids with years they occur, construct a header and sprintf template then the rest can be done in a sort of Schwartzian Transform.

    use strict; use warnings; my $rhData = { 1990 => [1, 2, 3], 1991 => [2, 3, 5], 1992 => [1, 2, 7, 9], 1993 => [3, 7, 8, 13], 1994 => [1, 4, 8, 12, 17], 1995 => [2, 3, 4, 6, 7, 10, 12, 13, 14, 20], 1996 => [1, 4, 5, 8, 9, 12, 14, 16, 17, 19, 20], 1997 => [1, 2, 5, 6, 7, 10, 12, 14, 16, 17, 20], 1998 => [4, 5, 6, 7, 10, 11, 13, 14, 15, 18, 20], 1999 => [3, 5, 6, 7, 8, 9, 11, 12, 16, 19, 20] }; my $rhRevLookup = {}; foreach my $year (keys %$rhData) { $rhRevLookup->{$_}->{$year} ++ for @{$rhData->{$year}}; } my @range = (1995 .. 1999); my %frequencies = (); $frequencies{$_} ++ for map { (@{$rhData->{$_}}) } @range; my $header = q{ItemID}; $header .= qq{ $_} for @range; $header .= qq{ Freq.\n}; my $template = q{%-10d}; $template .= q{%-8s} for @range; $template .= qq{%-5d\n}; print $header; print map { sprintf $template, @$_ } map { my $raRow = []; push @$raRow, $_->[0]; foreach my $year (@range) { push @$raRow, $rhRevLookup->{$_->[0]}->{$year} ? q{=} : q{ }; } push @$raRow, $_->[1]; $raRow; } sort { $b->[1] <=> $a->[1] || $a->[0] <=> $b->[0] } map { [$_, $frequencies{$_}] } keys %frequencies;

    Here is the output.

    ItemID 1995 1996 1997 1998 1999 Freq. 20 = = = = = 5 5 = = = = 4 6 = = = = 4 7 = = = = 4 12 = = = = 4 14 = = = = 4 4 = = = 3 10 = = = 3 16 = = = 3 1 = = 2 2 = = 2 3 = = 2 8 = = 2 9 = = 2 11 = = 2 13 = = 2 17 = = 2 19 = = 2 15 = 1 18 = 1

    I hope this is of use.

    Cheers,

    JohnGG