Beefy Boxes and Bandwidth Generously Provided by pair Networks
more useful options
 
PerlMonks  

finding unique items in an array, from a text file

by spazmospazmo (Initiate)
on Jan 13, 2009 at 18:54 UTC ( [id://736041]=perlquestion: print w/replies, xml ) Need Help??

spazmospazmo has asked for the wisdom of the Perl Monks concerning the following question:

Hey, guys! I'm a perl newb and I'm trying to open a text file, push the lines into an array and then identify which items are unique. I'm having problems with the script, and any advice will be appreciated. The script follows. Thanks!
#! C:\Perl\bin\perl.exe use strict; my $file = "controls.txt"; open (FH, "< $file") or die "Can't open $file for read: $!"; my @lines; while (<FH>) { push (@lines, $_); %seen = (); @uniq = (); foreach @lines { unless ($seen{$item}) { #if we get here, we have not seen it before $seen{$item} = 1; push(@uniq, $item); } } } close FH or die "Cannot close $file: $!"; print @lines; # see if it worked

Replies are listed 'Best First'.
Re: finding unique items in an array, from a text file
by ikegami (Patriarch) on Jan 13, 2009 at 20:23 UTC

    By the way,

    my %seen; my @uniq; foreach my $item (@lines) { unless ($seen{$item}) { #if we get here, we have not seen it before $seen{$item} = 1; push(@uniq, $item); } }

    can be written as

    my %seen; my @uniq = grep !$seen{$_}++, @lines;
Re: finding unique items in an array, from a text file
by tomfahle (Priest) on Jan 13, 2009 at 19:12 UTC

    A modified version of your code

    #!/usr/bin/perl use strict; use warnings; my $file = "controls.txt"; # always(!!!) use the three argument form of open open (FH, "<", $file) or die "Can't open $file for read: $!"; my @lines; while (<FH>) { chomp; push (@lines, $_); } close(FH) or die "Cannot close $file: $!"; my %seen = (); my @uniq = (); foreach my $line ( @lines ) { unless ($seen{$line}) { #if we get here, we have not seen it before $seen{$line} = 1; push(@uniq, $line); } } print join(", ",@uniq) , "\n"; # see if it worked

    controls.txt contains

    1 1 2 3 3 3 3 3 5 5 6 6 7 7

    which gives the following output:

    1, 2, 3, 5, 6, 7

    Hope this helps.
    Thomas

      This is good but I think it may be overload overwhelming. I think what he needs is to see what's wrong with his script before we show any tricks ( i know, standard perl ).

      For example, show how his script can be made to work as close to what it is right now.

      What is probably happening is that you're getting errors about undeclared symbols, like %seen, for example.

      Also what you need to do is separate the stage where you read in the lines, and the place where you do something with them, like identify the unique ones.

      The following example may get closer to what you want, it still will have errors..

      #! C:\Perl\bin\perl.exe use strict; my $file = "controls.txt"; open (FH, "< $file") or die "Can't open $file for read: $!"; my @lines; while (<FH>) { push (@lines, $_); } my %seen = (); my @uniq = (); foreach @lines { unless ($seen{$item}) { #if we get here, we have not seen it before $seen{$item} = 1; push(@uniq, $item); } } close FH or die "Cannot close $file: $!"; print @lines; # see if it worked

      Remember, every symbol ($variable_name) must be given a scope(range of where in the code it is valid), with my. You can't just introduce a $item variable if it has not been declared and scoped. Which is what you are correctly doing with my @lines, and my $file, etc..

      Yes! Thank you, Thomas! That's exactly what I needed. Also, I can learn from this. Thanks to all of you for your comments and support! Tom Tolleson
Re: finding unique items in an array, from a text file
by kyle (Abbot) on Jan 13, 2009 at 19:33 UTC

    Since someone's already fixed the OP's code to do what was intended, I'll put in how I might have written this.

    Unlike another short solution posted elsewhere, this keeps the lines in their original order while filtering out the duplicates.

Re: finding unique items in an array, from a text file
by Corion (Patriarch) on Jan 13, 2009 at 18:57 UTC

    What problems do you have? You already have added comments, so maybe you can help us to help you better by describing what the program does, what you expect it to do, and how the two differ. Also, you can help us by supplying example data which demonstrates your points.

Re: finding unique items in an array, from a text file
by andye (Curate) on Jan 13, 2009 at 19:28 UTC
    hiya - looks a bit overcomplicated, should be really simple. IMHO you don't need @lines, @uniq and %seen... only need one hash to do this.

    I'd prefer to do something like this (off the top of my head, code not tested):

    # open the file into FH my %uniq; $uniq{$_} = 1 while (<FH>); print join "\n", keys %uniq; # close the file

    Hope that helps. All the best.

      I think there might be a couple of problems with your code.

      • You join the output with \ns but you did not chomp the input so you will get extra blank lines.
      • Hash keys are unordered so the line order of the original file will be scrambled. This may or may not be an issue.

      The grep solutions suggested might be better if line order is to be preserved. Something like (again, not tested).

      ... my %seen = (); print grep ! $seen{ $_ } ++, <$fh>; ...

      Cheers,

      JohnGG

        All good points. ;)
Re: finding unique items in an array, from a text file
by mr_mischief (Monsignor) on Jan 13, 2009 at 19:28 UTC
    If this is homework, you should probably tell us. People don't mind helping, but we don't want to do your homework for you.

    I'm assuming this isn't homework, then. You're over thinking things, for one. Simplify it. There's no need to build an array of lines just to build a hash of lines, for example.

    Beyond over thinking the problem, you have some specific issues with your code:

    1. Your version is not even syntactically correct.
    2. You check $item but you never assign anything to it. Either use the default variable ($_) or specify a loop variable.
    3. You say you're wanting to find unique lines, but you are trying to print all lines instead of the unique ones.

    use strict; use warnings; # you left this out my $file = "controls.txt"; open (my $fh, '<', $file) or die "Can't open $file for read: $!"; # Don't use two-arg open unless you know why, and use # lexical filehandles unless you know of a reason not to. my %seen; while ( <$fh> ) { $seen{$_} = 1; # Later assignments to an identical key # overwrite earlier ones, so there's really # no need to check if all you want is uniqueness. } my @unique = keys %seen; print @unique;

    This could, for a very simplified version, boil down to this:

    perl -le '$seen{$_}++ while <>; print keys %seen;'

    Under perl 5.10, this is a decent Perl golf version:

    perl -lE'$s{$_}++for<>;say keys%s'

    Don't play golf with code you intend to use for serious work or for a school project. I did it here just to show you how much you are complicating the task for yourself. There are actually very few things going on in this code on the level at which Perl is capable of processing it for you. The only steps you need are:

    1. read the file line by line, optionally with an explicit open and error check (which if you're using it for serious work would be "less optional")
    2. assign to a data structure that maintains uniqueness of keys (hash)
    3. gather and print the data as limited by the previous step (keys returns all the keys of a hash, and then that gets printed
    The for loop makes an extra copy in memory that you don't need. Everything you need to do can be done with a while loop. Some may consider that distinction a premature optimization. It's something to consider, though, if you're working with large files.
Re: finding unique items in an array, from a text file
by leocharre (Priest) on Jan 13, 2009 at 19:11 UTC
    What's the problem, that is.. what is your output when you run it? You can post the output in your original post.

    perl is pretty damn good about telling you about errors and why- the messages may seem cryptic- just slow down and read them carefully- they don't lie and they tell you as little as possible so you can fix your problem.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://736041]
Approved by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others wandering the Monastery: (2)
As of 2024-04-26 02:30 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found