Very basic question while reading a file line by line

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Very basic question while reading a file line by line by GrandFather (Saint) on Dec 10, 2022 at 04:21 UTC
Your sample data smells like CSV. If that is the case you really should use Text::CSV to read the file. That aside, I'd be much happier to see some code that you have attempted to use and a description of how it fails than a simple cap in hand request for us to do your homework for you. Oh, and Anonymous Monk doesn't get to consider itself a fellow monk in my book. To gain that qualification you should join the monastery. Optimising for fewest key strokes only makes sense transmitting to Pluto or beyond	[reply]
Re: Very basic question while reading a file line by line by atcroft (Abbot) on Dec 10, 2022 at 01:45 UTC
Q: How would you do it manually? A: You would recall which you have already seen, and only output if it were not in that list. Q: How to do that in code? A: One way would be to put the name into a hash, and only output if it were not present. In my example code below (which I used data from an array instead of a file, but the logic within the while loop is the same as if processing a file), I split the line into two (2) parts based on m/\s+/ (one or more whitespace characters, which could be spaces, tabs, etc). I then check if the name exists as a key in the hash (%seen); if not, I output the line. After the check, I increment the value of the hash element with the name as the key. Output: `$ ./11148698-00.pl id name 123 john 11 peter 87 helen` [download] Code: Read more... (633 Bytes) Hope that helps.	[reply] [d/l] [select]
Re: Very basic question while reading a file line by line by kcott (Archbishop) on Dec 10, 2022 at 05:48 UTC
"Very basic question ..." Unfortunately, the question itself is too basic. You have omitted information which, if provided, would have resulted in a better answer for you. Your input appears to be a tab-separated CSV file. Three things suggest this: You refer to columns, not fields (CSV files have columns). I've added a record to your posted data to demonstrate the difference (see below). In each record, the second elements are aligned (with tabs?). You have a header record (which is common for CSV files). You've said nothing about the encoding of your data. I've used "UTF-8" for both input and output; you may need something else. Your data seems very simplistic. Is what you posted truly representative of your real data? I added an extra record to your posted input: `$ cat test_in.csv id name 123 john 34 john 567 john 11 peter 899 peter 87 helen 961 Anonymous Monk` [download] In a normal file, with no special format defined, and to the extent that it's represented in a webpage, that last record has three fields; however, if a CSV format is specified, that last record has only two columns, just like all of the other records. Here's the CSV format revealed ('`^I`' represents a tab; '`$`' represents a newline): `$ cat -vet test_in.csv id^Iname$ 123^Ijohn$ 34^Ijohn$ 567^Ijohn$ 11^Ipeter$ 899^Ipeter$ 87^Ihelen$ 961^IAnonymous Monk$` [download] Parsing CSV files has many gotchas. Don't try writing your own code to deal with all of these: Text::CSV has already done so; its use is highly recommended. Note that if you, or your users, have Text::CSV_XS installed, it will run faster (without requiring any change to the "`use Text::CSV;`" statement). The code for performing the filtering is fairly straightforward. Here's a few notes: autodie — let Perl deal with I/O exception handling: it won't get it wrong; it won't forget to do it; it's a tedious task that I'd prefer not to have to do myself. constant — I like to have named array indices. Possibly overkill in such a tiny script; although, I still think "`$row->[NAME]`" is immediately clear, while "`$row->[1]`" may take a moment's thought. [Aside: Just this week, working with some legacy code, I came across this sort of thing: "`$aref->[25]`". I was not happy about having to go back several screenfuls and start counting; then check for changes to that count (e.g. via `unshift()`).] `%seen` — that's a standard name and the way I've used it is a standard idiom. You'll see it in lots of code and documentation. Note that the postfix increment is important; the idiom will not work with a prefix increment. Anonymous block — files are only open for the time they are needed. Perl will automatically close them at the end of the block: another thing I don't need to concern myself with. Note: the automatic closing described only works with lexical filehandles. `$fh_in` & `$fh_out` — lexical filehandles: always prefer these over package variables, such as `IN` & `OUT`. They only exist in their scope (the anonymous block in this case) and can't interfere with, or be interfered by, code elsewhere in the program. open ‐ always use the 3-argument form, as I have here. You can't use the encoding in 1- or 2-argument forms. There's other benefits: the documentation has details. #!/usr/bin/env perl use strict; use warnings; use autodie; use constant NAME => 1; my $infile = 'test_in.csv'; my $outfile = 'test_out.csv'; use Text::CSV; my %seen; { my $csv = Text::CSV::->new({ binary => 1, sep_char => "\t", quote_char => undef, }); open my $fh_in, '<:encoding(UTF-8)', $infile; open my $fh_out, '>:encoding(UTF-8)', $outfile; (undef) = scalar <$fh_in>; # skip & discard header record while (my $row = $csv->getline($fh_in)) { $csv->say($fh_out, $row) unless $seen{$row->[NAME]}++; } } [download] Running that gives: `$ cat test_out.csv 123 john 11 peter 87 helen 961 Anonymous Monk` [download] Revealing CSV format: `$ cat -vet test_out.csv 123^Ijohn$ 11^Ipeter$ 87^Ihelen$ 961^IAnonymous Monk$` [download] — Ken	[reply] [d/l] [select]
Re: Very basic question while reading a file line by line by tybalt89 (Monsignor) on Dec 10, 2022 at 08:54 UTC
Why use a while loop and read line by line when there are other ways ? `#!/usr/bin/perl use strict; # https://perlmonks.org/?node_id=11148698 use warnings; use List::AllUtils qw( uniq_by ); open my $fh, '<', \<<END; id name 123 john 34 john 567 john 11 peter 899 peter 87 helen END <$fh>; # skip first line print uniq_by { (split)[1] } <$fh>;` [download]	[reply] [d/l]
Re: Very basic question while reading a file line by line by Marshall (Canon) on Dec 10, 2022 at 09:01 UTC
another solution `use strict; use warnings; <DATA>; #throw away first line my %names; while (<DATA>) { my ($name) = (split ' ',$_)[1]; print unless $names{$name}++; } =PRINTS 123 john 11 peter 87 helen =cut __DATA__ id name 123 john 34 john 567 john 11 peter 899 peter 87 helen` [download]	[reply] [d/l]


Keep It Simple, Stupid
	PerlMonks