http://qs321.pair.com?node_id=246024

maksl has asked for the wisdom of the Perl Monks concerning the following question:

dear fellows;

our c++ teacher taught that strings are the strength of any unix bash, well he reffered to the art of unix programming by eric raymond and meant that text streams are easily put toqether.

well i would love to come up with a perl one-line, which prints out a sorted list of words from any textfile.
i got rid of special characters and lowercased everything, but didn't know how to sort the new stuff so i used sort and uniq in the shell, but i'm still getting duplicate entries:

perl -ne '$_ =~ s/\W/ /g;$_=~ s/\s+/\n/g;print lc' textfile |uniq|sort

thx for something for perlish
maksl

p.s.:i'm really excited en avance :)

Replies are listed 'Best First'.
Re: one liner to print out sorted list of word
by dakkar (Hermit) on Mar 26, 2003 at 18:37 UTC
    perl -ne '$_=lc;s/\W+/ /g;@w{split /\s+/}=();END{$,="\n";print sort ke +ys %w}'

    That expanded becomes:

    while (<>) { # for each input line $_=lc; # lowercase s/\W+/ /g; # maps all (seqs of) non-word chars to space @wl=split /\s+/; # take the words of this line @w{@wl}=() # put them as keys into a hash (undef values) } $,="\n"; # separate 'print' args with a newline print sort keys %w; # print the sorted keys

    Since hash keys are unique, it does what you need.

    On the command-line, without using Perl, you do it this way:

    tr -cs '[:alnum:]' '\n' < textfile |tr '[:upper:]' '[:lower:]'|sort|un +iq

    The two trs perform the "cleaning" and "lowercasing" in a locale-dependent fashion. To have the same with the above Perl one-liner, add a -Mlocale before the -ne

    -- 
            dakkar - Mobilis in mobile
    
Re: one liner to print out sorted list of word
by graff (Chancellor) on Mar 26, 2003 at 18:35 UTC
    What you have would work if your pipe had sort and uniq in the other order:
    perl -ne ... | sort | uniq
    For that matter, you don't need uniq as a separate process:
    perl -ne ... | sort -u
    and of course, your perl one-line could do everything:
    perl -ne 's/\W+/ /g; for $w (split){$u{$w}++} END{print join $/,sort k +eys %u,""}'
    For that matter, you could also modify that slightly to print uniq words with their frequencies of occurrence (by including the values of %u along with the keys), which is also handy.
Re: one liner to print out sorted list of word
by BrowserUk (Patriarch) on Mar 26, 2003 at 18:36 UTC

    Build a hash of the words to eliminate duplicates and sort the keys to order the output. (No shell required:)

    Update: I forgot the lowercase requirement.

    perl -nle"$words{lc$1}=undef while m[(\w+)]g; END{ print for sort keys + %words }" <textfile

    (Use single quotes instead of double as needed)


    Examine what is said, not who speaks.
    1) When a distinguished but elderly scientist states that something is possible, he is almost certainly right. When he states that something is impossible, he is very probably wrong.
    2) The only way of discovering the limits of the possible is to venture a little way past them into the impossible
    3) Any sufficiently advanced technology is indistinguishable from magic.
    Arthur C. Clarke.
Re: one liner to print out sorted list of word
by zby (Vicar) on Mar 26, 2003 at 18:34 UTC
    First sort then uniq: sort|uniq.
Re: one liner to print out sorted list of word
by MrYoya (Monk) on Mar 26, 2003 at 18:39 UTC
    Here's one way, it's case sensitive:
    perl -e 'open(FILE, $ARGV[0]); print sort <FILE>' text
    Also, try |sort|uniq to get rid of duplicate entries.
Re: one liner to print out sorted list of word (sort(1))
by Aristotle (Chancellor) on Mar 30, 2003 at 20:26 UTC
    Why is everyone trying to reinvent the wheel?
    sort -u file # case sensitive sort -fu file # insensitive
    That takes care of lines. To change the interpretation of what a line is, use tr(1), f.ex
    < file tr [:blank:] '\n' | sort -fu
    Shell still excels at really simple things it has dedicated tools for. Perl beats it if you want to do something there's no exactly matching tool for.

    Makeshifts last the longest.

      Isn't ShellMonks up the road somewhere?

      thx a lot Aristotle for your "_minimalistic_" reply :)
      update: second even shorter line with code from dakkar
      < file tr [:blank:],[:punct:] '\n' | sort -fu < file tr -cs '[:alnum:]' '\n'|sort -fu
      is exactly what i was looking for!!!
      never saw the file redirector < on the beginning of a shell command, i use it for things like:
      mail -s "test" person@what_ever.org < file
Re: one liner to print out sorted list of word
by maksl (Pilgrim) on Mar 28, 2003 at 09:15 UTC