What does !$saw{$

nwkcmk has asked for the wisdom of the Perl Monks concerning the following question:

Hi, I'm a amateur in Perl. I saw from web that following code is used remove duplicates in a array

undef %saw;
@out = grep(!$saw{$_}++, @in);
[download]

Looks like !$saw{$_}++ is too much for me to digest. Hopefully someone can explain to me what this expression means.

Comment on What does !$saw{$_}++ means Download Code

Replies are listed 'Best First'.
Re: What does !$saw{$_}++ means by Corion (Patriarch) on Jan 26, 2005 at 08:39 UTC
Let's split it up a bit: You have a hash, `%saw`. Individual elements of a hash are accessed via `$hash{key}`. If you append the postfix ++ operator to it, it looks like `$saw{key}++`, which increments the hash element by one. Now, what does that do? The `grep` iterates over the whole array `@in`, sets `$_` to each element and then executes the code, in our case the expression `!$saw{$_}++`. If the expression returns a true value, `grep` keeps the array element in its result, otherwise it's discarded. Now, when a key in `%saw` does not already exists, the code sets `$saw{key}` to 1 (incrementing by one from undef), and then returns the negation of the previous hash value (undef is false, so it returns true). So, if the hash key did not yet exist in the hash, the array element is put into `@out`. The other case is that the hash key already exists in the hash. Then `$saw{$_}` returns a number greater than zero, which is interpreted as true, and the negation of that is false, so the (duplicate) array element in `@in` is discarded. This method is a nice and easy way (once you understand it) to get a unique list of elements in an array while retaining the order. There are other methods, like using the keys of `%saw`: `undef %saw; $saw{$_}++ for @in; @out = keys %saw;` [download] This code puts the same elements into `@out`, but you lose the order. I don't remember if it was quicker, but there also is the non-looped version: `undef %saw; @saw{@in} = (1) x @in; @out = keys %saw;` [download]	[reply] [d/l] [select]
Re^2: What does !$saw{$_}++ means by ysth (Canon) on Jan 26, 2005 at 09:45 UTC
returns the negation of the previous hash value (undef is false, so it returns true) Nit: post-increment returns 0 if the value was undef, not undef. This is occasionally criticised, defended, pointed out as inconsistent with post-decrement (which does return undef), and then dropped. A comment in pp.c refers one to http://www.xray.mpe.mpg.de/mailing-lists/perl5-porters/2003-03/msg00536.html for further info (where you can read the criticism, defense, etc.).	[reply]
Re^2: What does !$saw{$_}++ means by Eimi Metamorphoumai (Deacon) on Jan 26, 2005 at 13:36 UTC
I don't remember if it was quicker, but there also is the non-looped version: `undef %saw; @saw{@in} = (1) x @in; @out = keys %saw;` [download] I haven't benchmarked it, but I've frequently heard it asserted that even faster is `undef %saw; @saw{@in} = (); @out = keys %saw;` [download] That is, using undef as your values (so you don't need the increment or to create the list of ones).	[reply] [d/l] [select]
Re^2: What does !$saw{$_}++ means by ikegami (Patriarch) on Jan 26, 2005 at 17:06 UTC
but there also is the non-looped version: Except that `x` is a loop operator, just like `map`. In fact, it not only loops for the number of duplications, it also implicitely loops over each element of the list on the LHS. `@saw{@in}` and `keys(%saw)` are also loops, but they are implicit unlike `x`. In fact, you can remove two of the four loops of your "non-looped version": `undef %saw; undef(@saw{@in}); @out = keys(%saw);` [download] Update: `undef %saw` is also an implicit loop, on calls other than the first.	[reply] [d/l] [select]
Re: What does !$saw{$_}++ means by Zaxo (Archbishop) on Jan 26, 2005 at 08:49 UTC
With no change to the idiom, I'd write that as, `my %saw; my @out = grep { ! $saw{$_}++ } @in;` [download] `%saw` is a hash, initialized empty in the my declaration. I changed that to keep from eliminating and then automatically reproducing (autovivifying) a global `%saw`. The expression `$saw{$_}++` adds one to the value associated with the key that is `$_`'s value. It returns the value `$saw{$_}` had before the addition. `$_` is a variable grep sets in turn to each element of `@in`. "`!`" is logical "not", so the boolean value `$saw{$_}` had is inverted. That means grep only sees true when the hash hadn't seen the key yet. Thus, grep only passes along the first instance it sees of each element in `@in`. This is an idiom. It seems complicated at first, but will soon be second nature to you. Sometimes it is written to increment `$saw{$_}` in a loop over `@in`, and then set `@out` to `keys(%saw)`, but that doesn't preserve the order of the elements. Yours does. After Compline, Zaxo	[reply] [d/l]
Re^2: What does !$saw{$_}++ means by Anonymous Monk on Jan 26, 2005 at 09:56 UTC
With no change to the idiom, I'd write that as, `my %saw; my @out = grep { ! $saw{$_}++ } @in;` [download] But that extends the lifetime of `%saw`. If later in the same (or an inner) block, you need to use the same construct, you have to use a different name for the hash, or do a `%saw = ()` - which will then cause errors if you remove the first construct (until you `my` the newer construct). I like to write it as: `my @out = do {my %saw; grep !$saw{$_}++, @in};` [download] which doesn't leak the name of the temporary array.	[reply] [d/l] [select]
Re^2: What does !$saw{$_}++ means by blazar (Canon) on Jan 26, 2005 at 09:06 UTC
With no change to the idiom, I'd write that as, `my %saw; my @out = grep { ! $saw{$_}++ } @in;` [download] Personally I prefer to use the "EXPR-form" of grep() if possible. But this largely depends on the case under examination: in some cases while it could be possible to use that, still it is more terse to adopt the "BLOCK-form", as you did. Definitely good point about `my %saw;` instead. But after all the snippet posted by the OP is too small to really understand wether he's using non-strict code or if he's reusing a previously used %saw (I wouldn't do that, FWIW) or...	[reply] [d/l] [select]
Re: What does !$saw{$_}++ means by Hena (Friar) on Jan 26, 2005 at 08:39 UTC
This is my conclusion of it. It can be wrong :). Well, it uses %saw hash to keep track on how many inputs have passed (NOTE the return before increment). Grep gets number when asking keep or lose. ! will reverse the ok/fail answer. So when first of duplicate inputs gets there, $saw{$_}++ returns 0 and increments. ! will reverse that to 1 and grep takes it in. second has already a value and $saw{$_}++ returns 1 (and increments) which ! reverses to 0 and grep drops.	[reply]
Re: What does !$saw{$_}++ means by ZlR (Chaplain) on Jan 26, 2005 at 08:46 UTC
Hello, Here's my interpretation : %saw is a hash. $saw{$_} is the value in this hash associated with the key $_ $saw{$_}++ increases this value by 1 !$saw{$_} is a logical which means "there is no value associated with the key $_ in %saw" . Now, this is used inside a grep applied to @in : this means that $_ will take each of the value in @in . Let's take the first value of @in : obviously it's not yet in %saw so the conditional !$saw{$_} is TRUE (it's a double negation) . Therefore grep validates and this first value goes into @out. At this time a little magic happens : after evaluation of !$saw{$_} the ++ is applied . I'm only guessing this happens because of some precedence of ! over ++ . So what if the second element of @in is the same as the first ? Well, since ++ happened , $saw{$_} will have a value of 1 and therefore !$saw{$_} will be FALSE : you will not get this repetition in the final @out . Hope this helps, ZlR . Question : I just checked in the camel book: the ! opertor has an arity of 1 and is right associative the ++ operator also has an arity of 1 but is not associative. I'm not sure then why the !$saw{$_}++ is correctly evaluated since they have the same arity. Answer by ysth : nothing to do with arity , it's just that ! has higher precedence than ++ .	[reply]
Re^2: What does !$saw{$_}++ means by tphyahoo (Vicar) on Jan 26, 2005 at 09:05 UTC
!saw{$_} should have been !$saw{$_} (Maybe obvious, but since this is about deobfuscating an idiom, I thought I'd post anyway.)	[reply]
Re^3: What does !$saw{$_}++ means by ZlR (Chaplain) on Jan 26, 2005 at 09:13 UTC
Yep ! Corrected :)	[reply]
Re^2: What does !$saw{$_}++ means by ysth (Canon) on Jan 26, 2005 at 10:12 UTC
I only have a Camel II, which doesn't list arity in the operator table, so I'm not sure exactly what you are seeing. However, arity has nothing to do with precedence; !$seen{$_}++ works because ++ has higher precedence than !.	[reply]
Re^3: What does !$saw{$_}++ means by ZlR (Chaplain) on Jan 26, 2005 at 10:36 UTC
Yes, i checked again (camel 3) and this time i understood the table corectly : precedence is shown by the way the operators are ordered, while "arity" is the number of argument they can take. thx .	[reply]
Re: What does !$saw{$_}++ means by blazar (Canon) on Jan 26, 2005 at 09:00 UTC
Note: parens around C<grep>'s args are not necessary, so it can be even more concise. ("Elegant", IMHO!) You should really read perldoc -f grep perldoc perlsyn perldoc perlop Once you know the elementary charachteristics of each subexpression of the above, combining them should not be too hard. However grep() will take all the elements from @in and evaluate the `!$saw{$_}++` expression for them ($_ is an alias to the actual element). On the first encounter of an item `$saw{$_}` will be undef, so `$saw{$_}++` will return 0 and store one in it. At this point `!$saw{$_}++` will return 1 and the element under examination will be passed. A similar analysis for the case in which `$saw{$_}` already contains a (positive) number (i.e. on second, third, etc. encounter of an item) is left as an exercise to you.	[reply] [d/l] [select]
Re: What does !$saw{$_}++ means by ikegami (Patriarch) on Jan 26, 2005 at 16:54 UTC
`!$saw{$_}++` is `!$saw{$_}++` is `!($saw{$_}++)` is `!(do { my $previously_seen = $saw{$_}; $saw{$_} += 1; $previously_seen })` is `do { my $previously_seen = $saw{$_}; $saw{$_} += 1; !$previously_seen }` is a simplification of: `do { my $previously_seen = $saw{$_}; $saw{$_} = 1; !$previously_seen }` means (when in context of the grep) If the string we're looking at is in %saw, return false. Otherwise, add that string to %saw and return true. means (when in context of the grep) Remove duplicates.	[reply] [d/l] [select]
Re: What does !$saw{$_}++ means by nwkcmk (Sexton) on Jan 27, 2005 at 02:34 UTC
Hi all, Thanks for the explanation. They have been very helpful.	[reply]

Back to Seekers of Perl Wisdom