http://qs321.pair.com?node_id=737371

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hi, what would be the best(easier to read the code) way to separate a string "word word word word" into an array of words? (Words are separated by space characters)
  • Comment on the best way to separate a string into words

Replies are listed 'Best First'.
Re: the best way to separate a string into words
by jeffa (Bishop) on Jan 19, 2009 at 19:16 UTC
      It'd probably be best to split using the magical ' '.
      my $string = " a b c "; # note leading/trailing whitespaces my @array1 = split /\s+/, $string; # returns '', 'a', 'b', 'c' (4 item +s) my @array2 = split ' ', $string; # returns 'a', 'b', 'c' (3 item +s)

      I'd go with the \W+ ... but I guess it all depends on how you define a word.

      -derby

        I might too, but the OP defined words as "separated by space characters"

        jeffa

        L-LL-L--L-LL-L--L-LL-L--
        -R--R-RR-R--R-RR-R--R-RR
        B--B--B--B--B--B--B--B--
        H---H---H---H---H---H---
        (the triplet paradiddle with high-hat)
        
Re: the best way to separate a string into words
by borisz (Canon) on Jan 19, 2009 at 19:19 UTC
    split is the most easy way.
    Boris
Re: the best way to separate a string into words
by Tanktalus (Canon) on Jan 20, 2009 at 00:13 UTC

    Generally, I use Text::ParseWords. But that's because I like the option for the user to pass in word "some phrase" word to get only three "words" out of it, basically allowing a form of escaping on spaces that makes sense. This isn't a whole lot more complicated than using split ' ', $string, but offers huge amounts of extra flexibility. Whether you do this or not is dependant on whether you want that flexibility or not.

Re: the best way to separate a string into words (contractions)
by tye (Sage) on Jan 20, 2009 at 02:11 UTC

    For one definition of "split" and "words", I'd use:

    my @words= $string =~ /(\w+(?:'\w+)*)/g;

    Which would give you words like qw( split and words I'd use ) not like qw( "split" and "words", ) nor like qw( words I d use ).

    Update: Or even, allow hyphenated-word capturing:

    my @words= $string =~ /(\w+(?:[-']\w+)*)/g;

    - tye        

Re: the best way to separate a string into words
by GrandFather (Saint) on Jan 19, 2009 at 19:46 UTC

    If you are natural language processing you will find the Lingua modules very useful.


    Perl's payment curve coincides with its learning curve.
A reply falls below the community's threshold of quality. You may see it by logging in.