Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?
 
PerlMonks  

split every second word

by Anonymous Monk
on Dec 04, 2003 at 15:25 UTC ( [id://312216]=perlquestion: print w/replies, xml ) Need Help??

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hi Monks, I need a lil' help with this text manipulation

I've got names in a list like the following:

Bob Builder Tinky Winky Bugs Bunny Mickey Mouse

and ive basically got put them on separate lines. I think split can do this, and will store it in an array which is what i want but i have no idea how to split every other word?. I was also wondering if anybody knows how to remove spaces from the end of words

so if i had "hello " it would give me "hello"...will s// /; do it?

Thank you all

Replies are listed 'Best First'.
Re: split every second word
by broquaint (Abbot) on Dec 04, 2003 at 15:32 UTC
    If you're working with simple words then a quick match should do the trick e.g
    my $str = "Bob Builder Tinky Winky Bugs Bunny Mickey Mouse"; my @names = $str =~ /(\w+ \w+)/g; print "[$_]\n" for @names; __output__ [Bob Builder] [Tinky Winky] [Bugs Bunny] [Mickey Mouse]
    See. perlre and perlop for more info.
    HTH

    _________
    broquaint

Re: split every second word
by shockme (Chaplain) on Dec 04, 2003 at 15:56 UTC
    broquaint is pretty much right on. Given your example input, it'll work.

    As to removing spaces, the following will do it:

    my $str = "Bob "; $str =~ s/\s$//; # or s/ $//; print "|$str|\n";

    If things get any worse, I'll have to ask you to stop helping me.

Re: split every second word
by Art_XIV (Hermit) on Dec 04, 2003 at 16:24 UTC
    use strict; while (<DATA>) { print ">", trim($_), "<\n"; } sub trim { my ($text) = @_; $text =~ s/^\s+|\s+$//g; #remove leading/trailing whitespace return $text; } __DATA__ Cowboy Bebop Bubblegum Crisis Big O Dragonball Z
    Hanlon's Razor - "Never attribute to malice that which can be adequately explained by stupidity"
      $text =~ s/^\s+|\s+$//g;

      This can be hopelessly inefficient (the regexp engine gets bogged down in the middle of the string, looking for hypothetical end anchors). The longer the string gets, the better it is to write:

      $text =~ s/^\s+//g; $text =~ s/\s+$//g;

      Consider the following, somewhat pathological cases:

      #! /usr/local/bin/perl -w use Benchmark qw/:all/; ` my $long = ' aaa bbb ccc' . (' ' x 100).'ggg hhh '; my $short = ' aaa bbb ccc ddd eee fff ggg hhh '; sub one_long { my $s = $long; $s =~ s/^\s+|\s+$//g; $s; } sub one_short { my $s = $short; $s =~ s/^\s+|\s+$//g; $s; } sub two_long { my $s = $long; $s =~ s/^\s+//g; $s =~ s/\s+$//g; $s; } sub two_short { my $s = $short; $s =~ s/^\s+//g; $s =~ s/\s+$//g; $s; } print "tests:\n"; { no strict 'subs'; print "$_ [", &$_, "]\n" for qw/one_long one_short two_long two_short/; } cmpthese( shift || 1000, { one_long => \&one_long, one_short => \&one_short, two_long => \&two_long, two_short => \&two_short, } ); __PRODUCES__ tests: one_long [aaa bbb ccc + ggg hhh] one_short [aaa bbb ccc ddd eee fff ggg hhh] two_long [aaa bbb ccc + ggg hhh] two_short [aaa bbb ccc ddd eee fff ggg hhh] Benchmark: timing 100000 iterations of one_long, one_short, two_long, +two_short... one_long: 8 wallclock secs ( 7.13 usr + 0.00 sys = 7.13 CPU) @ 14 +019.72/s (n=100000) one_short: 2 wallclock secs ( 1.62 usr + 0.00 sys = 1.62 CPU) @ 61 +835.75/s (n=100000) two_long: 1 wallclock secs ( 0.62 usr + 0.00 sys = 0.62 CPU) @ 16 +0000.00/s (n=100000) two_short: 0 wallclock secs ( 0.63 usr + 0.00 sys = 0.63 CPU) @ 15 +8024.69/s (n=100000) Rate one_long one_short two_short two_long one_long 14020/s -- -77% -91% -91% one_short 61836/s 341% -- -61% -61% two_short 158025/s 1027% 156% -- -1% two_long 160000/s 1041% 159% 1% --

      It should be obvious from the results that it's good insurance to write it in the two s/// form and be done with it :)

        Grinder - Thanks for the correction and the downloadable code!

        I have use the 'one_long' regex wayyy to many times w/o realizing there is a better way. Thanks for the enlightenment!

        Hanlon's Razor - "Never attribute to malice that which can be adequately explained by stupidity"
Re: split every second word
by Anonymous Monk on Dec 04, 2003 at 23:57 UTC
    so if i had "hello " it would give me "hello"...will s// /; do it?
    Did you try it? Observe:
    $_=1234; s// /; print __END__ 1234
Re: split every second word
by duff (Parson) on Dec 05, 2003 at 00:21 UTC

    Not that it helps your particular problem (besides, I think you've been adequately answered), but this would be easy to implement using perl 6 in terms similar to how you framed your question:

    @names = split m:each:2nd/\s+/, $string;

    I.e., split the $string on each 2nd occurence of one or more whitespace characters.

Re: split every second word
by jweed (Chaplain) on Dec 04, 2003 at 15:43 UTC

    Well, I'm not entirely sure about the nature of your text. If it is in a file, one name to each line like you say it is (well you say that they basically are):

    Bob Builder Tinky Winky etc.
    then you can simply open the file and read it into an array like this:
    my @names = <FH>

    If they are all on one line like your example, it is a bit trickier. But, since this seems like it might be a homework problem, I'll have you figure it out.

    Update Well, I guess I misread your original post. You left out the to in got to put them on separate lines. Don't mind me.


    Who is Kayser Söze?

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://312216]
Approved by TheHobbit
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others wandering the Monastery: (6)
As of 2024-04-18 20:43 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found