Beefy Boxes and Bandwidth Generously Provided by pair Networks
Just another Perl shrine
 
PerlMonks  

comment on

( [id://3333]=superdoc: print w/replies, xml ) Need Help??

To check for a space followed by a word character is simple, though there are a few similar patterns that might serve your needs best:

$string =~ / \w/; # a space followed by a word character $string =~ /\s\w/; # any whitespace character followed by a word chara +cter $string =~ /\s\S/; # any whitespace character followed by a non-whites +pace character

However, since you're applying a regex here, it might be just as efficient to go ahead and do the split and then see whether it split anything. That would take a bit more time on the lines that are a single word, but less time on the ones with multiple words:

#!/usr/bin/env perl use 5.010; use strict; use warnings; my @s = ('John', 'John ', 'John Doe', 'John P. Doe'); # last 2 should +match for (@s){ my @v = split /\s+\b/; # split on whitespace followed by a word bou +ndary if(@v > 1 ){ # if the split did any splitting say; # do stuff with the line or elements } }

Update: I thought I'd benchmark it (code below), and found that if 50% of the values needed to be split as in the example above, the two methods were equally fast:

Rate split and check check and split split and check 145/s -- -1% check and split 146/s 1% --

But when I made it so 75% of the values needed to be split, the "split everything and then check for a second element" method was the clear winner:

Rate check and split split and check check and split 112/s -- -17% split and check 136/s 21% --

So it looks like if less than half your lines will need to be split, check first, then split the ones that matched. If more than half will end up being split, just split them all and check for a second element in the resulting array, and go from there. (Incidentally, checking for the second element ($v[1]) was also a gain over checking the number of elements (@v>1) as I originally did.) Here's the benchmarking code:

#!/usr/bin/env perl use 5.010; use strict; use warnings; use Benchmark qw(:all); use Data::Printer; # my @s = ('John', 'John ', 'John Doe', 'John P. Doe') x 1000; # big a +rray 50% need split my @s = ('John', 'John Poe', 'John Doe', 'John P. Doe') x 1000; # big +array 75% need split cmpthese( 1000, { 'split and check' => \&one, 'check and split' => \&two, }); sub one { for (@s){ my @v = split /\s+\b/; # split on a space followed by a word +boundary if($v[1] ){ # if the split did any splitting # do stuff with the line or elements } } } sub two { for (@s){ if (/\s\b/){ # if the line would be split my @v = split /\s+\b/; # split it # do stuff with the line or element +s } } }

Aaron B.
Available for small or large Perl jobs and *nix system administration; see my home node.


In reply to Re: Check for Spaces in a String by aaron_baugher
in thread Check for Spaces in a String by Anonymous Monk

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Are you posting in the right place? Check out Where do I post X? to know for sure.
  • Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
    <code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
  • Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
  • Want more info? How to link or How to display code and escape characters are good places to start.
Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others chanting in the Monastery: (8)
As of 2024-04-25 11:14 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found