Re: Very basic question while reading a file line by line

"Very basic question ..."

Unfortunately, the question itself is too basic. You have omitted information which, if provided, would have resulted in a better answer for you.

Your input appears to be a tab-separated CSV file. Three things suggest this:

You refer to columns, not fields (CSV files have columns). I've added a record to your posted data to demonstrate the difference (see below).
In each record, the second elements are aligned (with tabs?).
You have a header record (which is common for CSV files).

You've said nothing about the encoding of your data. I've used "UTF-8" for both input and output; you may need something else.

Your data seems very simplistic. Is what you posted truly representative of your real data?

I added an extra record to your posted input:

$ cat test_in.csv
id  name
123 john
34  john
567 john
11  peter
899 peter
87  helen
961 Anonymous Monk
[download]

In a normal file, with no special format defined, and to the extent that it's represented in a webpage, that last record has three fields; however, if a CSV format is specified, that last record has only two columns, just like all of the other records. Here's the CSV format revealed ('^I' represents a tab; '$' represents a newline):

$ cat -vet test_in.csv
id^Iname$
123^Ijohn$
34^Ijohn$
567^Ijohn$
11^Ipeter$
899^Ipeter$
87^Ihelen$
961^IAnonymous Monk$
[download]

Parsing CSV files has many gotchas. Don't try writing your own code to deal with all of these: Text::CSV has already done so; its use is highly recommended. Note that if you, or your users, have Text::CSV_XS installed, it will run faster (without requiring any change to the "use Text::CSV;" statement).

The code for performing the filtering is fairly straightforward. Here's a few notes:

autodie — let Perl deal with I/O exception handling: it won't get it wrong; it won't forget to do it; it's a tedious task that I'd prefer not to have to do myself.
constant — I like to have named array indices. Possibly overkill in such a tiny script; although, I still think "$row->[NAME]" is immediately clear, while "$row->[1]" may take a moment's thought.
[Aside: Just this week, working with some legacy code, I came across this sort of thing: "$aref->[25]". I was not happy about having to go back several screenfuls and start counting; then check for changes to that count (e.g. via unshift()).]
%seen — that's a standard name and the way I've used it is a standard idiom. You'll see it in lots of code and documentation. Note that the postfix increment is important; the idiom will not work with a prefix increment.
Anonymous block — files are only open for the time they are needed. Perl will automatically close them at the end of the block: another thing I don't need to concern myself with. Note: the automatic closing described only works with lexical filehandles.
$fh_in & $fh_out — lexical filehandles: always prefer these over package variables, such as IN & OUT. They only exist in their scope (the anonymous block in this case) and can't interfere with, or be interfered by, code elsewhere in the program.
open ‐ always use the 3-argument form, as I have here. You can't use the encoding in 1- or 2-argument forms. There's other benefits: the documentation has details.

#!/usr/bin/env perl

use strict;
use warnings;
use autodie;
use constant NAME => 1;

my $infile = 'test_in.csv';
my $outfile = 'test_out.csv';

use Text::CSV;

my %seen;

{
    my $csv = Text::CSV::->new({
        binary => 1, sep_char => "\t", quote_char => undef,
    });
    open my $fh_in, '<:encoding(UTF-8)', $infile;
    open my $fh_out, '>:encoding(UTF-8)', $outfile;
    (undef) = scalar <$fh_in>; # skip & discard header record

    while (my $row = $csv->getline($fh_in)) {
        $csv->say($fh_out, $row) unless $seen{$row->[NAME]}++;
    }
}
[download]

Running that gives:

$ cat test_out.csv
123 john
11  peter
87  helen
961 Anonymous Monk
[download]

Revealing CSV format:

$ cat -vet test_out.csv
123^Ijohn$
11^Ipeter$
87^Ihelen$
961^IAnonymous Monk$
[download]

— Ken

Comment on Re: Very basic question while reading a file line by line Select or Download Code


Do you know where your variables are?
	PerlMonks