http://qs321.pair.com?node_id=28747

eduardo has asked for the wisdom of the Perl Monks concerning the following question:

maverick and I were talking at lunch, about trying to figure out what the largest word that could be typed solely with the left hand was. I, am not a perl master, more specifically I am certainly not a regexp master... so I never think of solutions in terms of regular expressions... my solution was:
sub without_regexp { my $largest = ""; #create my set my %set = ( ); foreach (qw( y h n u j m i k o l p )) { $set{$_} = 1; } #open file, and for every word that is input, if it's #longer than the current longest, for every character, if #it is in the set, jump out, otherwise, if all characters #are NOT in the set (the word had none of the offending #letters), then make it the largest... open (INFILE, "</usr/dict/words") || die "error $!"; while (<INFILE>) { chomp; if (length($_) > length $largest) { if (! scalar grep { $set{$_} } split('', lc($_))) { $largest = $_; } } } close (INFILE) || die "error $!"; print "LARGEST FOUND: $largest\n"; }
it gave me an answer that seems plausible enough (although i haven't bothered to check it): aftereffect... all of those letters are NOT in the letters set that i told it. maverick however, being the super genious that he is, said: "i would just make a character class of all of the char's in the right hand, negate it with ^ and if the word matched that, it didn't have any of those charachters, so it was a possible solution. His code, as i understood it, would look like this (well, his would be nicer than this, but this is a good guess):
sub with_regexp { my $largest = ""; #for every line in the file, if it's length is greater #than the current longest, and it matches the set that #is all of the caracters in the RIGHT hand, not'ed, then #it is the largest open (INFILE, "</usr/dict/words") || die "error $!"; while (<INFILE>) { chomp; if (length($_) > length $largest) { if (/[^yhnujmikolp]/i) { $largest = $_; } } } close (INFILE) || die "error $!"; print "LARGEST FOUND: $largest\n"; }
it however gave me the answer: antidisestablishmentarianism which clearly has letters in that set! (n, i, etc...) so i thought, maybe my logic is wrong... what happens if i just not the test, like so:
sub with_regexp { my $largest = ""; #for every line in the file, if it's length is greater #than the current longest, and it matches the set that #is all of the caracters in the RIGHT hand, not'ed, then #it is the largest open (INFILE, "</usr/dict/words") || die "error $!"; while (<INFILE>) { chomp; if (length($_) > length $largest) { #NOTICE THE ! AT THE BEGINING if (! /[^yhnujmikolp]/i) { $largest = $_; } } } close (INFILE) || die "error $!"; print "LARGEST FOUND: $largest\n"; }
then the answer it gives me is: Honolulu which is a word that has all of the leters within that set... so, in other words, the original test, should have worked!!!! ok... so, why are the two first sections of code not equivalent, and why does the second one do exactly what I would expect it to, but not the first? Thanks!

Replies are listed 'Best First'.
Re: charachter classes
by merlyn (Sage) on Aug 21, 2000 at 03:28 UTC
    The second program says yes if there is at least one character not in the class, not if all characters are not in the class.

    You want something like:

    ... if (/^[^yhnujmikolp]+$/i) { ...
    which says "from the beginning to the ending, are they all not in this class".

    This is a common problem when people negate things too much: the negations stack up in the wrong direction. Me, I avoid that kind of regex, and ask for what I want, which will also be faster:

    @ARGV = qw(/usr/dict/words); my $longest = ""; while (<>) { next unless length > length $longest; # too short next if /[^qwertasdfgzxcvb\n]/i; # wrong hand $longest = $_; } print $longest;
    And this finds "aftereffect", as you noted. But with a lot less code, as I'm often prone to do. {grin}

    -- Randal L. Schwartz, Perl hacker

      in my dictionary file, there are multiple left-hand-only words sharing the longest length. the following code outputs all of them.

      @ARGV = qw(/usr/dict/words); my $len = 0; my @largest; while (<>) { chomp; next unless length >= $len && /^[qazwsxedcrfvtgb]+$/i; if (length > $len) { @largest = (); $len = length; } push @largest, $_; } print join(', ', @largest), $/;

      this gives the full set of answers: 'aftereffect', 'desegregate', 'exacerbated', 'exacerbates', 'exaggerated', 'exaggerates', 'reverberate', and 'vertebrates'.

Re: character classes
by chromatic (Archbishop) on Aug 21, 2000 at 05:19 UTC
    Here's another approach that's not necessarily better, but has a certain charm anyway:
    #!/usr/bin/perl -w use strict; @ARGV = qw(/usr/dict/words); my $longest = ''; while (<>) { chomp; next unless length > length $longest; if (length == tr/qwertasdfgzxcvb/qwertasdfgzxcvb/) { $longest = $_; } } print $longest;
Re: charachter classes
by young perlhopper (Scribe) on Aug 21, 2000 at 05:53 UTC
    I believe stewardesses is actually the longest word that can be typed with the left hand.

    And i can't take much credit for this, I just got this from some stupid trivia tidbit as opposed to writing perl to figure it out.

    -Mark
    mlogan@ccs.neu.edu

      'stewardesses' is not in /usr/dict/words, though. neither are 'aftereffects', 'reverberated', 'reverberates', 'desegregated', or 'desegregates', all of which are also 12 letters, same as 'stewardesses'.

      if you really want to push it, you could include 'desegregaters', which is 13 letters (although, the real word may be spelled 'desegregators', in which case it isn't left-hand-only). i was unable to find either 'desegregator' or 'desegregater' with Google, but i would imagine many English speakers would find it a grammatical word if someone used it in a conversation. anyone have the Oxford handy?

Re: charachter classes
by Anonymous Monk on Aug 21, 2000 at 16:40 UTC
    While not adding anything to what has been said this one-liner is what I thought up on seeing the question perl -ne 'print,$len=length if !/[qwertasdfgzxcvb]/i and length>=$len' /usr/dict/words oh and another word for the right hand equal to Honolulu is monopoly
Re: charachter classes
by Adam (Vicar) on Aug 22, 2000 at 00:18 UTC
    This thread is a great example I why I like the Perl Monastery. The people here are not only knowledgable and friendly, but they enjoy playing with Perl too.
Re: charachter classes
by Anonymous Monk on Aug 21, 2000 at 05:57 UTC
    Too bad it doesn't find the word 'stewardesses', which is longer than all of those other solutions. Moral of the story: don't trust the computer.
      Heh, the moral of the story is get a better dictionary =P
      http://www.dcs.shef.ac.uk/research/ilash/Moby/index.html

      --
      $you = new YOU;
      honk() if $you->love(perl)