Re: Filtering text file made into hash

You're almost there! But as my anonymous brother pointed out, you'll have to think about what you want to achieve and how you want to do it before actually coding it up.

You want to print names if they appear at least four times in the file. Just the names, or their associated lengths as well? If the latter, do you want to print all lengths, so long as at least one is greater than or equal to 50? And how often do you want to print them?

For instance, take your sample data. Here's a variety of possible outputs I could think of:

"CA", once, because that name appears at least four times and has at least one length >= 50.
"CA", four times, since that name appears four times and has an associated length >= 50 four times.
"CA", four times, since that name appears four times and has an associated length >= 50 at least once.
"CA", followed by all the associated lengths, regardless of whether they're >= 50.
"CA", followed by only the associated lengths that are >= 50.
...

I usually go for a general solution that allows me to tweak the output as desired later on without having to change the crunching (much). Which brings me to another important remark - the overall structure for this sort of program is generally:

Read input into the right kind of data structure
Process
Output

This may seem trivial, especially since there's no processing going on here (I don't count selecting what to output as processing, really), but it helps to separate these steps. For instance, what kind of data structure is right?

You're dealing with key/value pairs here, so it'll likely be something involving a hash. Without knowing exactly what data you need to preserve, I'd suggest simply saving ALL the lengths for each name, in the order they appear in the input file. (As an added benefit, counting the number of lengths for a given name will then also tell you how often that name appeared in your input.) So, use a hash of arrays. perldsc, the Perl Data Structures Cookbook, may be of help there.

Further observe that you can use split to break up your input line along commas to separate names from lengths, and use any from the List::Util core module to test if any length for a given name is >= 50, and here's my starting point for a solution:

#!/usr/bin/perl

use Modern::Perl '2014';

# core modules
use List::Util qw/any/;

# this will hold the data read
my %names = ();

# 1. read input data
while(<DATA>) {
    chomp;

    # split $_ along commas, returning at most 2 pieces
    my ($name, $length) = split /,/, $_, 2;

    # save $length for $name
    push @{ $names{$name} }, $length;
}

# 2. processing - none

# 3. select what to output
foreach my $name (sort keys %names) {

    # did $name appear at least four times?
    if(scalar @{ $names{$name} } >= 4) {

        # is at least one of the associated lengths >= 50?
        if(any { $_ > 50 } @{ $names{$name} }) {

            say "$name: ", join ",", @{ $names{$name} };

        }
    }
}

__DATA__
CA,57
MO,22
CA,88
CA,99
NC,34
CA,104
[download]

This outputs:

$ perl 1146088.pl
CA: 57,88,99,104
$
[download]

BTW - I left out the file-handling code on purpose here and read from __DATA__ (see Special Literals for more on that) instead to focus on the important bits. You'll know what to do. :) (In general it's perhaps simpler/better to read from <<>> anyway and use the shell to redirect input and output as desired.)

Comment on Re: Filtering text file made into hash Select or Download Code

Replies are listed 'Best First'.

Re^2: Filtering text file made into hash
by GrandFather (Saint) on Oct 27, 2015 at 22:36 UTC

my %names = ();

is better as:

my %names;

Hashes and arrays are made fresh (and empty) when they are declared so you don't need to clutter code with redundant initialization.

Avoid nested blocks. Your for loop is better written:

for my $name (sort keys %names) {
    # Skip if fewer than 4 occurrences of name or none over 50
    next if @{$names{$name}} < 4 || !any {$_ > 50} @{$names{$name}};

    say "$name: ", join ",", @{$names{$name}};
}
[download]

Use early exits to avoid nesting in loops and subs. Nested blocks makes logic flow much harder to analyze. Using early exits allows a simple to understand list of test/handle/bail steps.

Premature optimization is the root of all job security

[reply]
[d/l]
[select]


Syntactic Confectionery Delight
	PerlMonks