Read file line by line and check equal lines

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Read file line by line and check equal lines by Util (Priest) on Mar 06, 2007 at 07:31 UTC
`use strict` and `use warnings`. I don't think that your trick of `$line % 2 == 0` can work; both singles and pairs can occur on even or odd lines. If you "pre-load" `$prev` before you begin your `while(<>){...}` loop, you can avoid keeping track of `$line == 1` and `$line >= 2`. `Here is my (loosely tested) version: #!perl use strict; use warnings; my $last_line = <DATA>; my $seen_count = 1; while (<DATA>) { if ( $last_line ne $_ ) { print $last_line if $seen_count == 1; $last_line = $_; $seen_count = 0; } $seen_count++; } print $last_line if $seen_count == 1; __END__ a1a a1a b1b c1c c1c d1d d1d e1e f1f g1g g1g h1h h1h i1i j1j` [download]	[reply] [d/l] [select]
Re: Read file line by line and check equal lines by rinceWind (Monsignor) on Mar 06, 2007 at 07:53 UTC
You may be looking for a Perl solution, so I'll give you feedback. You may be learning Perl, or may not be running on a Unix platform. First a comment about lexical variables and `use strict;`. Get yourself into the habit of declaring your variables with `my`, and only declaring them in the narrowest scope that you need them. A Super search on "use strict" will give you many answers that explain why this is a good idea. Secondly, your test for evenness of the line count is the wrong way to do things. Each time you get an odd number of singular lines, it will throw out the evenness check. Here's my stab: `use strict; use warnings; my $prev; while (<DATA>) { chomp; # Note that $prev will be undef only on the first time round. my $curr = $_; if (!defined $prev) { $prev = $curr; next; } if ($curr ne $prev) { print "$prev\n"; $prev = $curr; } else { undef $prev; } } print "$prev\n" if defined $prev; __DATA__ a1a a1a b1b c1c c1c d1d d1d e1e f1f g1g g1g h1h h1h i1i j1j` [download] Note that you don't deal with, or say whay you want to happen when you get three or more identical lines in the input. -- Apprentice wetware hacker	[reply] [d/l] [select]
Re: Read file line by line and check equal lines by graq (Curate) on Mar 06, 2007 at 08:38 UTC
I thought I would tempt you with this version. `#!/usr/bin/perl use strict; use warnings; my $previous; &unique while(<DATA>); sub unique { return if $previous and $previous eq $_; print; $previous = $_; } __DATA__ a1a a1a b1b c1c c1c d1d d1d e1e f1f g1g g1g h1h h1h i1i j1j k1k k1k k1k` [download] -=( Graq )=-	[reply] [d/l]
Re^2: Read file line by line and check equal lines by Anno (Deacon) on Mar 06, 2007 at 13:07 UTC
Why the sub call with a leading ampersand (`&unique`)? Since perl 5 that form is no longer necessary and has the side-effect of using (and exposing)the caller's `@_` instead of building its own. You are not using `@_` at all, so it's unnecessary. Instead, your sub uses two global variavles, one lexical ($previous) and the package variable $_. That should be avoided except in very special cases. In this case it is hard to see why you use a sub at all. Just expand the code in the loop body. That would be clearer. Anno	[reply] [d/l] [select]
Re^2: Read file line by line and check equal lines by duff (Parson) on Mar 06, 2007 at 17:27 UTC
That doesn't exactly solve the stated problem, does it? As I read it, the OP only wants those lines that only appear one time in the input. What you've given is a way to display a given line from the input at most one time. Here's a solution (untested, so there's probably boundary problems) that only needs to keep at most 3 lines in memory under the constraint that the lines are already sorted. `#!/usr/bin/perl use strict; use warnings; my ($p1, $p2); while(<DATA>) { next unless $p1 and $p2; if ($p2 eq $p1) {$p2 = $p1 = undef; redo; } if ($p2 ne $p1) { print $p2; next; } } continue { $p2 = $p1; $p1 = $_; } __DATA__ a1a a1a b1b c1c c1c d1d d1d e1e f1f g1g g1g h1h h1h i1i j1j k1k k1k k1k` [download] duff	[reply] [d/l]
Re: Read file line by line and check equal lines by diotalevi (Canon) on Mar 06, 2007 at 07:14 UTC
The unix program `uniq` with the `-d` parameter does exactly this. ⠤⠤ ⠙⠊⠕⠞⠁⠇⠑⠧⠊	[reply] [d/l] [select]
Re^2: Read file line by line and check equal lines by McDarren (Abbot) on Mar 06, 2007 at 12:10 UTC
Actually, no. If you look at the desired output provided by the OP it seems that he wants those lines that appear once, and once only. `uniq -d` prints duplicated lines, so that will not help him (or her). What he wants is `uniq -u` --Darren :)	[reply] [d/l] [select]
Re^3: Read file line by line and check equal lines by diotalevi (Canon) on Mar 06, 2007 at 16:03 UTC
Ah, the sweet success of using a tool exactly backwards! ⠤⠤ ⠙⠊⠕⠞⠁⠇⠑⠧⠊	[reply]
Re: Read file line by line and check equal lines by McDarren (Abbot) on Mar 06, 2007 at 13:32 UTC
"Also it is huge file so i cannot use array or hash." How huge? Have you tried it with a hash - you might be surprised :) Update: Note - as correctly pointed out by chrism01, the below won't work where you have odd numbers of duplicates. See below for a solution that I believe addresses that issue. Give the following a go: `#!/usr/bin/perl -w use strict; my %wanted; while (<DATA>) { exists $wanted{$_} ? delete $wanted{$_} : $wanted{$_}++; } print sort keys %wanted; __DATA__ a1a a1a b1b c1c c1c d1d d1d e1e f1f g1g g1g h1h h1h i1i j1j` [download] Output: `b1b e1e f1f i1i j1j` [download] Update: or as a one-liner: `perl -ne 'exists $x{$_}?delete $x{$_}:$x{$_}++;}{print for sort keys +%x;' < input.txt > output.txt` [download] Try running that on your input file. The point about using a hash in that way is that you are only creating hash keys for those lines that are unique (and only appear once), so it's actually quite efficient. Whenever you are thinking "unique", a hash is almost certainly what you want. Cheers, Darren :)	[reply] [d/l] [select]
Re^2: Read file line by line and check equal lines by chrism01 (Friar) on Mar 06, 2007 at 23:39 UTC
Mcdarren, I like your 1st version, but it seems to me it'll only work for even nums of duplicates eg if an item occurs 3 (5,7,9...) times, it'll be re-instated/preserved by your script? Of course, the OP's example file only has duplicates in 2s, but the description doesn't state whether this is always the case. I agree about using a hash, but I'd keep a count of all lines and test for cnt == 1 after looping through the input Cheers Chris	[reply]
Re^3: Read file line by line and check equal lines by McDarren (Abbot) on Mar 07, 2007 at 07:15 UTC
Yes, you're absolutely correct - nice catch :) Here's an updated one-liner that addresses that problem in the way you suggest: `perl -ne '$x{$_}++;}{for(sort keys %x){print if $x{$_}==1;}' < input.t +xt` [download] (I'm not a golfer by any strech of the imagination, so I imagine that could be shortened significantly) Cheers, Darren :)	[reply] [d/l]
Re: Read file line by line and check equal lines by hangon (Deacon) on Mar 07, 2007 at 05:22 UTC
Something similar to this should do it. The trick is not to update $lastline until you don't have a match. As a side note, in the past I have successfully loaded around 100K lines into an array. You may be surprised at what Perl can handle. `open (IN, "$input_file"); open (OUT, ">$output_file"); my $lastline = <IN>; print OUT $lastline; while(<IN>){ my $line = $_; if ($line eq $lastline){ next; } print OUT $line; $lastline = $line; }` [download] update: corrected typo	[reply] [d/l]
Re: Read file line by line and check equal lines by thezip (Vicar) on Mar 07, 2007 at 06:49 UTC
Update My apologies -- I completely missed the line about "no arrays or hashes" -- sorry for the noise This way has always worked for me: `use strict; use warnings; use Data::Dumper; my %hash; open(IFH, "<", "data.txt"); while(<IFH>) { chomp; # keep a running count of occurrences for each line string $hash{$_}++; } close IFH; my @uniq = sort grep { $hash{$_} == 1} keys %hash; print Dumper(\%hash); print Dumper(\@uniq); __OUTPUT__ $VAR1 = { 'j1j' => 1, 'i1i' => 1, 'b1b' => 1, 'a1a' => 2, 'f1f' => 1, 'e1e' => 1, 'h1h' => 2, 'c1c' => 2, 'g1g' => 2, 'd1d' => 2 }; $VAR1 = [ 'b1b', 'e1e', 'f1f', 'i1i', 'j1j' ];` [download] Where do you want them* to go today?*	[reply] [d/l]
Re: Read file line by line and check equal lines by Moron (Curate) on Mar 06, 2007 at 13:00 UTC
(update: tested and corrected by now) `perl -e '$_{ $_ }++ for (<>); print grep { $_{$_}==1 } keys %_;' <inpu +t >output` [download] -M Free your mind	[reply] [d/l]
Re: Read file line by line and check equal lines by thezip (Vicar) on Mar 07, 2007 at 07:51 UTC
Perhaps this solution will work for you: use strict; use warnings; my @arr = (); open(IFH, "<", "data.txt"); my $cur = scalar <IFH>; push @arr, $cur; # @arr contains, at most, N identical lines # .ie if "d1d" occurs five times in a row, then # @arr will contain the 5 occurrences of "d1d" # @arr is reset to one element as new strings # are encountered while($cur = <IFH>) { if ($cur eq $arr[0]) { push @arr, $cur; } else { # if here, we have a new string, so check # the size of @arr to see if current string is unique print $arr[0] if scalar(@arr) == 1; @arr = ($cur); } } print $arr[0] if scalar(@arr) == 1; close IFH; __OUTPUT__ b1b e1e f1f i1i j1j [download] Where do you want them* to go today?*	[reply] [d/l]


No such thing as a small change
	PerlMonks