Beefy Boxes and Bandwidth Generously Provided by pair Networks
Do you know where your variables are?
 
PerlMonks  

Re: Check for Spaces in a String

by aaron_baugher (Curate)
on Jun 15, 2015 at 20:52 UTC ( #1130529=note: print w/replies, xml ) Need Help??


in reply to Check for Spaces in a String

To check for a space followed by a word character is simple, though there are a few similar patterns that might serve your needs best:

$string =~ / \w/; # a space followed by a word character $string =~ /\s\w/; # any whitespace character followed by a word chara +cter $string =~ /\s\S/; # any whitespace character followed by a non-whites +pace character

However, since you're applying a regex here, it might be just as efficient to go ahead and do the split and then see whether it split anything. That would take a bit more time on the lines that are a single word, but less time on the ones with multiple words:

#!/usr/bin/env perl use 5.010; use strict; use warnings; my @s = ('John', 'John ', 'John Doe', 'John P. Doe'); # last 2 should +match for (@s){ my @v = split /\s+\b/; # split on whitespace followed by a word bou +ndary if(@v > 1 ){ # if the split did any splitting say; # do stuff with the line or elements } }

Update: I thought I'd benchmark it (code below), and found that if 50% of the values needed to be split as in the example above, the two methods were equally fast:

Rate split and check check and split split and check 145/s -- -1% check and split 146/s 1% --

But when I made it so 75% of the values needed to be split, the "split everything and then check for a second element" method was the clear winner:

Rate check and split split and check check and split 112/s -- -17% split and check 136/s 21% --

So it looks like if less than half your lines will need to be split, check first, then split the ones that matched. If more than half will end up being split, just split them all and check for a second element in the resulting array, and go from there. (Incidentally, checking for the second element ($v[1]) was also a gain over checking the number of elements (@v>1) as I originally did.) Here's the benchmarking code:

#!/usr/bin/env perl use 5.010; use strict; use warnings; use Benchmark qw(:all); use Data::Printer; # my @s = ('John', 'John ', 'John Doe', 'John P. Doe') x 1000; # big a +rray 50% need split my @s = ('John', 'John Poe', 'John Doe', 'John P. Doe') x 1000; # big +array 75% need split cmpthese( 1000, { 'split and check' => \&one, 'check and split' => \&two, }); sub one { for (@s){ my @v = split /\s+\b/; # split on a space followed by a word +boundary if($v[1] ){ # if the split did any splitting # do stuff with the line or elements } } } sub two { for (@s){ if (/\s\b/){ # if the line would be split my @v = split /\s+\b/; # split it # do stuff with the line or element +s } } }

Aaron B.
Available for small or large Perl jobs and *nix system administration; see my home node.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1130529]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others lurking in the Monastery: (8)
As of 2020-05-29 14:59 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    If programming languages were movie genres, Perl would be:















    Results (169 votes). Check out past polls.

    Notices?