note
dakkar
<code>
perl -ne '$_=lc;s/\W+/ /g;@w{split /\s+/}=();END{$,="\n";print sort keys %w}'
</code>
<p>That expanded becomes:</p>
<code>
while (<>) { # for each input line
$_=lc; # lowercase
s/\W+/ /g; # maps all (seqs of) non-word chars to space
@wl=split /\s+/; # take the words of this line
@w{@wl}=() # put them as keys into a hash (undef values)
}
$,="\n"; # separate 'print' args with a newline
print sort keys %w; # print the sorted keys
</code>
<p>Since hash keys are unique, it does what you need.</p>
<p>On the command-line, without using Perl, you do it this way:</p>
<code>
tr -cs '[:alnum:]' '\n' < textfile |tr '[:upper:]' '[:lower:]'|sort|uniq
</code>
<p>The two <tt>tr</tt>s perform the "cleaning" and "lowercasing" in a locale-dependent fashion. To have the same with the above Perl one-liner, add a <tt>-Mlocale</tt> before the <tt>-ne</tt></p>
<pre>--
dakkar - Mobilis in mobile
</pre>
246024
246024