http://qs321.pair.com?node_id=246038


in reply to one liner to print out sorted list of word

perl -ne '$_=lc;s/\W+/ /g;@w{split /\s+/}=();END{$,="\n";print sort ke +ys %w}'

That expanded becomes:

while (<>) { # for each input line $_=lc; # lowercase s/\W+/ /g; # maps all (seqs of) non-word chars to space @wl=split /\s+/; # take the words of this line @w{@wl}=() # put them as keys into a hash (undef values) } $,="\n"; # separate 'print' args with a newline print sort keys %w; # print the sorted keys

Since hash keys are unique, it does what you need.

On the command-line, without using Perl, you do it this way:

tr -cs '[:alnum:]' '\n' < textfile |tr '[:upper:]' '[:lower:]'|sort|un +iq

The two trs perform the "cleaning" and "lowercasing" in a locale-dependent fashion. To have the same with the above Perl one-liner, add a -Mlocale before the -ne

-- 
        dakkar - Mobilis in mobile