finding unique items in an array, from a text file

spazmospazmo has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: finding unique items in an array, from a text file by ikegami (Patriarch) on Jan 13, 2009 at 20:23 UTC
By the way, `my %seen; my @uniq; foreach my $item (@lines) { unless ($seen{$item}) { #if we get here, we have not seen it before $seen{$item} = 1; push(@uniq, $item); } }` [download] can be written as `my %seen; my @uniq = grep !$seen{$_}++, @lines;` [download]	[reply] [d/l] [select]
Re: finding unique items in an array, from a text file by tomfahle (Priest) on Jan 13, 2009 at 19:12 UTC
A modified version of your code `#!/usr/bin/perl use strict; use warnings; my $file = "controls.txt"; # always(!!!) use the three argument form of open open (FH, "<", $file) or die "Can't open $file for read: $!"; my @lines; while (<FH>) { chomp; push (@lines, $_); } close(FH) or die "Cannot close $file: $!"; my %seen = (); my @uniq = (); foreach my $line ( @lines ) { unless ($seen{$line}) { #if we get here, we have not seen it before $seen{$line} = 1; push(@uniq, $line); } } print join(", ",@uniq) , "\n"; # see if it worked` [download] controls.txt contains `1 1 2 3 3 3 3 3 5 5 6 6 7 7` [download] which gives the following output: `1, 2, 3, 5, 6, 7` [download] Hope this helps. Thomas	[reply] [d/l] [select]
Re^2: finding unique items in an array, from a text file by leocharre (Priest) on Jan 13, 2009 at 19:16 UTC
This is good but I think it may be ~~overload~~ overwhelming. I think what he needs is to see what's wrong with his script before we show any tricks ( i know, standard perl ). For example, show how his script can be made to work as close to what it is right now. What is probably happening is that you're getting errors about undeclared symbols, like %seen, for example. Also what you need to do is separate the stage where you read in the lines, and the place where you do something with them, like identify the unique ones. The following example may get closer to what you want, it still will have errors.. `#! C:\Perl\bin\perl.exe use strict; my $file = "controls.txt"; open (FH, "< $file") or die "Can't open $file for read: $!"; my @lines; while (<FH>) { push (@lines, $_); } my %seen = (); my @uniq = (); foreach @lines { unless ($seen{$item}) { #if we get here, we have not seen it before $seen{$item} = 1; push(@uniq, $item); } } close FH or die "Cannot close $file: $!"; print @lines; # see if it worked` [download] Remember, every symbol ($variable_name) must be given a scope(range of where in the code it is valid), with my. You can't just introduce a $item variable if it has not been declared and scoped. Which is what you are correctly doing with my @lines, and my $file, etc..	[reply] [d/l]
Re^2: finding unique items in an array, from a text file by thomastolleson (Initiate) on Jan 13, 2009 at 19:44 UTC
Yes! Thank you, Thomas! That's exactly what I needed. Also, I can learn from this. Thanks to all of you for your comments and support! Tom Tolleson	[reply]
Re: finding unique items in an array, from a text file by kyle (Abbot) on Jan 13, 2009 at 19:33 UTC
Since someone's already fixed the OP's code to do what was intended, I'll put in how I might have written this. <Reveal this spoiler or all in this thread> Unlike another short solution posted elsewhere, this keeps the lines in their original order while filtering out the duplicates.	[reply] [d/l]
Re: finding unique items in an array, from a text file by Corion (Patriarch) on Jan 13, 2009 at 18:57 UTC
What problems do you have? You already have added comments, so maybe you can help us to help you better by describing what the program does, what you expect it to do, and how the two differ. Also, you can help us by supplying example data which demonstrates your points.	[reply]
Re: finding unique items in an array, from a text file by andye (Curate) on Jan 13, 2009 at 19:28 UTC
hiya - looks a bit overcomplicated, should be really simple. IMHO you don't need @lines, @uniq and %seen... only need one hash to do this. I'd prefer to do something like this (off the top of my head, code not tested): `# open the file into FH my %uniq; $uniq{$_} = 1 while (<FH>); print join "\n", keys %uniq; # close the file` [download] Hope that helps. All the best.	[reply] [d/l]
Re^2: finding unique items in an array, from a text file by johngg (Canon) on Jan 13, 2009 at 23:27 UTC
I think there might be a couple of problems with your code. You join the output with `\n`s but you did not chomp the input so you will get extra blank lines. Hash keys are unordered so the line order of the original file will be scrambled. This may or may not be an issue. The grep solutions suggested might be better if line order is to be preserved. Something like (again, not tested). `... my %seen = (); print grep ! $seen{ $_ } ++, <$fh>; ...` [download] Cheers, JohnGG	[reply] [d/l] [select]
Re^3: finding unique items in an array, from a text file by andye (Curate) on Jan 14, 2009 at 12:41 UTC
All good points. ;)	[reply]
Re: finding unique items in an array, from a text file by mr_mischief (Monsignor) on Jan 13, 2009 at 19:28 UTC
If this is homework, you should probably tell us. People don't mind helping, but we don't want to do your homework for you. I'm assuming this isn't homework, then. You're over thinking things, for one. Simplify it. There's no need to build an array of lines just to build a hash of lines, for example. Beyond over thinking the problem, you have some specific issues with your code: Your version is not even syntactically correct. You check `$item` but you never assign anything to it. Either use the default variable (`$_`) or specify a loop variable. You say you're wanting to find unique lines, but you are trying to print all lines instead of the unique ones. `use strict; use warnings; # you left this out my $file = "controls.txt"; open (my $fh, '<', $file) or die "Can't open $file for read: $!"; # Don't use two-arg open unless you know why, and use # lexical filehandles unless you know of a reason not to. my %seen; while ( <$fh> ) { $seen{$_} = 1; # Later assignments to an identical key # overwrite earlier ones, so there's really # no need to check if all you want is uniqueness. } my @unique = keys %seen; print @unique;` [download] This could, for a very simplified version, boil down to this: `perl -le '$seen{$_}++ while <>; print keys %seen;'` [download] Under perl 5.10, this is a decent Perl golf version: `perl -lE'$s{$_}++for<>;say keys%s'` [download] Don't play golf with code you intend to use for serious work or for a school project. I did it here just to show you how much you are complicating the task for yourself. There are actually very few things going on in this code on the level at which Perl is capable of processing it for you. The only steps you need are: read the file line by line, optionally with an explicit open and error check (which if you're using it for serious work would be "less optional") assign to a data structure that maintains uniqueness of keys (hash) gather and print the data as limited by the previous step (`keys` returns all the keys of a hash, and then that gets printed The `for` loop makes an extra copy in memory that you don't need. Everything you need to do can be done with a `while` loop. Some may consider that distinction a premature optimization. It's something to consider, though, if you're working with large files.	[reply] [d/l] [select]
Re: finding unique items in an array, from a text file by leocharre (Priest) on Jan 13, 2009 at 19:11 UTC
What's the problem, that is.. what is your output when you run it? You can post the output in your original post. perl is pretty damn good about telling you about errors and why- the messages may seem cryptic- just slow down and read them carefully- they don't lie and they tell you as little as possible so you can fix your problem.	[reply]


more useful options
	PerlMonks