Beefy Boxes and Bandwidth Generously Provided by pair Networks
No such thing as a small change
 
PerlMonks  

Read file line by line and check equal lines

by Anonymous Monk
on Mar 06, 2007 at 06:59 UTC ( [id://603358]=perlquestion: print w/replies, xml ) Need Help??

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Monks,

I have input file like following:

a1a a1a b1b c1c c1c d1d d1d e1e f1f g1g g1g h1h h1h i1i j1j

The input file is sorted file. I want to check the lines which are present once and write in another file. Also it is huge file so i cannot use array or hash. I want to use while statement and read the file line by line and write the lines which are present once in another file.

The following code i tried, but i could not able to reach the target. So I seek your great help here.

while (<DATA>) { chomp; $curr = $_; $line = $.; if ($line == 1) { $prev = $curr; next; } if ($line % 2 == 0 && $line > 2 && !$flag) { $prev = $curr; next ; } else { $flag = 0; } if ($curr eq $prev) { $flag = 1; next; } print "$curr\n"; $prev = $curr; }

Here the output i need is:

b1b e1e f1f i1i j1j

Replies are listed 'Best First'.
Re: Read file line by line and check equal lines
by Util (Priest) on Mar 06, 2007 at 07:31 UTC

    1. use strict and use warnings.
    2. I don't think that your trick of $line % 2 == 0 can work; both singles and pairs can occur on even or odd lines.
    3. If you "pre-load" $prev before you begin your while(<>){...} loop, you can avoid keeping track of $line == 1 and $line >= 2.

    Here is my (loosely tested) version: #!perl use strict; use warnings; my $last_line = <DATA>; my $seen_count = 1; while (<DATA>) { if ( $last_line ne $_ ) { print $last_line if $seen_count == 1; $last_line = $_; $seen_count = 0; } $seen_count++; } print $last_line if $seen_count == 1; __END__ a1a a1a b1b c1c c1c d1d d1d e1e f1f g1g g1g h1h h1h i1i j1j

Re: Read file line by line and check equal lines
by rinceWind (Monsignor) on Mar 06, 2007 at 07:53 UTC

    You may be looking for a Perl solution, so I'll give you feedback. You may be learning Perl, or may not be running on a Unix platform.

    First a comment about lexical variables and use strict;. Get yourself into the habit of declaring your variables with my, and only declaring them in the narrowest scope that you need them. A Super search on "use strict" will give you many answers that explain why this is a good idea.

    Secondly, your test for evenness of the line count is the wrong way to do things. Each time you get an odd number of singular lines, it will throw out the evenness check.

    Here's my stab:

    use strict; use warnings; my $prev; while (<DATA>) { chomp; # Note that $prev will be undef only on the first time round. my $curr = $_; if (!defined $prev) { $prev = $curr; next; } if ($curr ne $prev) { print "$prev\n"; $prev = $curr; } else { undef $prev; } } print "$prev\n" if defined $prev; __DATA__ a1a a1a b1b c1c c1c d1d d1d e1e f1f g1g g1g h1h h1h i1i j1j

    Note that you don't deal with, or say whay you want to happen when you get three or more identical lines in the input.

    --
    Apprentice wetware hacker

Re: Read file line by line and check equal lines
by graq (Curate) on Mar 06, 2007 at 08:38 UTC
    I thought I would tempt you with this version.
    #!/usr/bin/perl use strict; use warnings; my $previous; &unique while(<DATA>); sub unique { return if $previous and $previous eq $_; print; $previous = $_; } __DATA__ a1a a1a b1b c1c c1c d1d d1d e1e f1f g1g g1g h1h h1h i1i j1j k1k k1k k1k

    -=( Graq )=-

      Why the sub call with a leading ampersand (&unique)? Since perl 5 that form is no longer necessary and has the side-effect of using (and exposing)the caller's @_ instead of building its own. You are not using @_ at all, so it's unnecessary.

      Instead, your sub uses two global variavles, one lexical ($previous) and the package variable $_. That should be avoided except in very special cases. In this case it is hard to see why you use a sub at all. Just expand the code in the loop body. That would be clearer.

      Anno

      That doesn't exactly solve the stated problem, does it? As I read it, the OP only wants those lines that only appear one time in the input. What you've given is a way to display a given line from the input at most one time.

      Here's a solution (untested, so there's probably boundary problems) that only needs to keep at most 3 lines in memory under the constraint that the lines are already sorted.

      #!/usr/bin/perl use strict; use warnings; my ($p1, $p2); while(<DATA>) { next unless $p1 and $p2; if ($p2 eq $p1) {$p2 = $p1 = undef; redo; } if ($p2 ne $p1) { print $p2; next; } } continue { $p2 = $p1; $p1 = $_; } __DATA__ a1a a1a b1b c1c c1c d1d d1d e1e f1f g1g g1g h1h h1h i1i j1j k1k k1k k1k
Re: Read file line by line and check equal lines
by diotalevi (Canon) on Mar 06, 2007 at 07:14 UTC

    The unix program uniq with the -d parameter does exactly this.

    ⠤⠤ ⠙⠊⠕⠞⠁⠇⠑⠧⠊

      Actually, no.
      If you look at the desired output provided by the OP it seems that he wants those lines that appear once, and once only.

      uniq -d prints duplicated lines, so that will not help him (or her). What he wants is uniq -u

      --Darren :)

        Ah, the sweet success of using a tool exactly backwards!

        ⠤⠤ ⠙⠊⠕⠞⠁⠇⠑⠧⠊

Re: Read file line by line and check equal lines
by McDarren (Abbot) on Mar 06, 2007 at 13:32 UTC
    "Also it is huge file so i cannot use array or hash."

    How huge?
    Have you tried it with a hash - you might be surprised :)

    Update: Note - as correctly pointed out by chrism01, the below won't work where you have odd numbers of duplicates. See below for a solution that I believe addresses that issue.

    Give the following a go:

    #!/usr/bin/perl -w use strict; my %wanted; while (<DATA>) { exists $wanted{$_} ? delete $wanted{$_} : $wanted{$_}++; } print sort keys %wanted; __DATA__ a1a a1a b1b c1c c1c d1d d1d e1e f1f g1g g1g h1h h1h i1i j1j
    Output:
    b1b e1e f1f i1i j1j

    Update: or as a one-liner:

    perl -ne 'exists $x{$_}?delete $x{$_}:$x{$_}++;}{print for sort keys +%x;' < input.txt > output.txt

    Try running that on your input file. The point about using a hash in that way is that you are only creating hash keys for those lines that are unique (and only appear once), so it's actually quite efficient. Whenever you are thinking "unique", a hash is almost certainly what you want.

    Cheers,
    Darren :)

      Mcdarren,
      I like your 1st version, but it seems to me it'll only work for even nums of duplicates eg if an item occurs 3 (5,7,9...) times, it'll be re-instated/preserved by your script?
      Of course, the OP's example file only has duplicates in 2s, but the description doesn't state whether this is always the case.
      I agree about using a hash, but I'd keep a count of all lines and test for cnt == 1 after looping through the input

      Cheers
      Chris

        Yes, you're absolutely correct - nice catch :)

        Here's an updated one-liner that addresses that problem in the way you suggest:

        perl -ne '$x{$_}++;}{for(sort keys %x){print if $x{$_}==1;}' < input.t +xt

        (I'm not a golfer by any strech of the imagination, so I imagine that could be shortened significantly)

        Cheers,
        Darren :)

Re: Read file line by line and check equal lines
by hangon (Deacon) on Mar 07, 2007 at 05:22 UTC

    Something similar to this should do it. The trick is not to update $lastline until you don't have a match. As a side note, in the past I have successfully loaded around 100K lines into an array. You may be surprised at what Perl can handle.

    open (IN, "$input_file"); open (OUT, ">$output_file"); my $lastline = <IN>; print OUT $lastline; while(<IN>){ my $line = $_; if ($line eq $lastline){ next; } print OUT $line; $lastline = $line; }

    update: corrected typo

Re: Read file line by line and check equal lines
by thezip (Vicar) on Mar 07, 2007 at 06:49 UTC

    Update My apologies -- I completely missed the line about "no arrays or hashes" -- sorry for the noise

    This way has always worked for me:
    use strict; use warnings; use Data::Dumper; my %hash; open(IFH, "<", "data.txt"); while(<IFH>) { chomp; # keep a running count of occurrences for each line string $hash{$_}++; } close IFH; my @uniq = sort grep { $hash{$_} == 1} keys %hash; print Dumper(\%hash); print Dumper(\@uniq); __OUTPUT__ $VAR1 = { 'j1j' => 1, 'i1i' => 1, 'b1b' => 1, 'a1a' => 2, 'f1f' => 1, 'e1e' => 1, 'h1h' => 2, 'c1c' => 2, 'g1g' => 2, 'd1d' => 2 }; $VAR1 = [ 'b1b', 'e1e', 'f1f', 'i1i', 'j1j' ];
    Where do you want *them* to go today?
Re: Read file line by line and check equal lines
by Moron (Curate) on Mar 06, 2007 at 13:00 UTC
    (update: tested and corrected by now)
    perl -e '$_{ $_ }++ for (<>); print grep { $_{$_}==1 } keys %_;' <inpu +t >output

    -M

    Free your mind

Re: Read file line by line and check equal lines
by thezip (Vicar) on Mar 07, 2007 at 07:51 UTC

    Perhaps this solution will work for you:

    use strict; use warnings; my @arr = (); open(IFH, "<", "data.txt"); my $cur = scalar <IFH>; push @arr, $cur; # @arr contains, at most, N identical lines # .ie if "d1d" occurs five times in a row, then # @arr will contain the 5 occurrences of "d1d" # @arr is reset to one element as new strings # are encountered while($cur = <IFH>) { if ($cur eq $arr[0]) { push @arr, $cur; } else { # if here, we have a new string, so check # the size of @arr to see if current string is unique print $arr[0] if scalar(@arr) == 1; @arr = ($cur); } } print $arr[0] if scalar(@arr) == 1; close IFH; __OUTPUT__ b1b e1e f1f i1i j1j

    Where do you want *them* to go today?

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://603358]
Approved by diotalevi
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others meditating upon the Monastery: (7)
As of 2024-04-23 15:09 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found