perl -ne '$_=lc;s/\W+/ /g;@w{split /\s+/}=();END{$,="\n";print sort ke
+ys %w}'
That expanded becomes:
while (<>) { # for each input line
$_=lc; # lowercase
s/\W+/ /g; # maps all (seqs of) non-word chars to space
@wl=split /\s+/; # take the words of this line
@w{@wl}=() # put them as keys into a hash (undef values)
}
$,="\n"; # separate 'print' args with a newline
print sort keys %w; # print the sorted keys
Since hash keys are unique, it does what you need.
On the command-line, without using Perl, you do it this way:
tr -cs '[:alnum:]' '\n' < textfile |tr '[:upper:]' '[:lower:]'|sort|un
+iq
The two trs perform the "cleaning" and "lowercasing" in a locale-dependent fashion. To have the same with the above Perl one-liner, add a -Mlocale before the -ne
--
dakkar - Mobilis in mobile
-
Are you posting in the right place? Check out Where do I post X? to know for sure.
-
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big>
<blockquote> <br /> <dd>
<dl> <dt> <em> <font>
<h1> <h2> <h3> <h4>
<h5> <h6> <hr /> <i>
<li> <nbsp> <ol> <p>
<small> <strike> <strong>
<sub> <sup> <table>
<td> <th> <tr> <tt>
<u> <ul>
-
Snippets of code should be wrapped in
<code> tags not
<pre> tags. In fact, <pre>
tags should generally be avoided. If they must
be used, extreme care should be
taken to ensure that their contents do not
have long lines (<70 chars), in order to prevent
horizontal scrolling (and possible janitor
intervention).
-
Want more info? How to link
or How to display code and escape characters
are good places to start.
|