Beefy Boxes and Bandwidth Generously Provided by pair Networks
Keep It Simple, Stupid
 
PerlMonks  

simpler regex

by rsiedl (Friar)
on May 10, 2007 at 07:48 UTC ( [id://614556]=perlquestion: print w/replies, xml ) Need Help??

rsiedl has asked for the wisdom of the Perl Monks concerning the following question:

hi monks,

i'm trying to write a regex to add a period after any single character in a string.
i.e. "Bob J Smith" becomes "Bob J. Smith", "A Jones" becomes "A. Jones", etc.

i've managed this with:
sub fullname { my (@parts) = @_; my $name = join(" ", @parts); $name =~ s/^(.) /$1\. /g; $name =~ s/ (.) / $1\. /g; $name =~ s/ (.)$/ $1\./g; return($name); } # end-sub
but was trying to get it down to one line. can anyone tell me why this is not working:
$name =~ s/(^| )(.)( |$)/$1$2\.$3/g;
"A A Jones" comes out as "A. A Jones" rather than "A. A. Jones".

cheers,
rsiedl

Replies are listed 'Best First'.
Re: simpler regex
by Corion (Patriarch) on May 10, 2007 at 07:59 UTC

    That's because a regular expression will only ever match once for every character position in a string, and no character that has been part of a previous match will be part of the next match. Let's use a different example to make talking easier:

    "X Y Jones"

    The space between X and Y does double duty, once as "end marker" of X and once as "start marker" of Y, but as it has already been used up as end marker, it won't be looked at again for the next start marker.

    I see two possible ways forward - either use lookahead to check for a space and not match it, like /(?=\s|$)/ or use the \b word boundary marker, which introduces other problems though:

    s/\b([A-Z])\b/$1./g

    will also do replacements for O'Reilly, A-J or other stuff. So, depending on your input data, that may be unwanted.

Re: simpler regex
by BrowserUk (Patriarch) on May 10, 2007 at 08:05 UTC

    Not well tested, but

    $_ = 'A A Jones'; s[(?<=[A-Z])(?=\s)][.]g; print;; A. A. Jones $_ = 'Bob J Smith'; s[(?<=[A-Z])(?=\s)][.]g; print;; Bob J. Smith $_ = 'Dr P J van Houten'; s[(?<=[A-Z])(?=\s)][.]g; print;; Dr P. J. van Houten

    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
      I think that's going to start doing the wrong thing if Dr van Houten starts putting letters after his name; the first set are ok but as you start adding more you get unwanted dots.

      $ perl -le '$_ = q{Dr P J van Houten MD}; > s[(?<=[A-Z])(?=\s)][.]g; > print;' Dr P. J. van Houten MD $ perl -le '$_ = q{Dr P J van Houten MD FRCS}; > s[(?<=[A-Z])(?=\s)][.]g; > print;' Dr P. J. van Houten MD. FRCS $

      A possible solution is to use alternation of two look-behinds.

      $ perl -le '$_ = q{Dr P J van Houten MD FRCS}; > s{(?:(?<=\A[A-Z])|(?<=\s[A-Z]))(?=\s)}{.}g; > print;' Dr P. J. van Houten MD FRCS $

      Cheers,

      JohnGG

Re: simpler regex
by borisz (Canon) on May 10, 2007 at 07:59 UTC
    1 while $name =~ s/(^| )(.)( |$)/$1$2\.$3/;
    Boris
Re: simpler regex
by scorpio17 (Canon) on May 10, 2007 at 13:42 UTC
    Since there's always more than one way to do it, you could consider operating on each part of the name befor doing the join:
    sub fullname { my (@parts) = @_; for my $p (@parts) { $p = length($p) > 1 ? $p : # not a single - don't change $p =~ /[a-z]/i ? $p . '.' : # single letter - add a dot $p; # not a letter - don't change } my $name = join(" ", @parts); return $name; }
    Breaking it up like this might be preferable if the problem were more complex, because really long regex's can be difficult to maintain. For example, if you decided to add a dollar sign in front of single digits, you could just add this line (after the add a dot line):
    $p =~ /\d/ ? '$' . $p : # single digit - add a $

    This seems easier (to me) than rewriting the regex, and less likely to introduce subtle bugs due to differences in regex behavior.

    Unless you're a regex master, I think it's best to keep them as simple as possible. And if you ARE a regex master, but have to work on a team where other people are NOT - then it's STILL best to keep them simple as possible!

      Following on, the CURRENT team may all be masters, but people move on. You can guarantee (99.99% of the time), that sooner or later a non-master will join ... and they'll curse your name ;-)
Re: simpler regex
by RL (Monk) on May 10, 2007 at 09:20 UTC
    $name =~ s/\b([a-z])\b/$1\./gi;

    Hope this helps
    RL

    update:
    Sorry, missed corion's answer

Re: simpler regex
by graff (Chancellor) on May 11, 2007 at 07:12 UTC
    You didn't happen to mention... is it the case that the list of input parameters for your "fullname" function happen to be the space-separated tokens that make up the full name?

    If that's what is being passed to the funcion, then it would be much simpler to add periods as needed before joining the parts together:

    sub fullname { my @parts = @_; for ( @parts ) { $_ .= '.' if ( /^[A-Z]$/ ); } return join " ", @parts; }
      That would be ideal, but i cant be guaranteed the user will input the data correctly. i.e. they may put in the middle name section "P J"...
        That's easy enough to accommodate:
        sub fullname { my @parts = @_; for ( @parts ) { s/(?<!\S)([A-Z])(?!\S)/$1./g; } return join " ", @parts; }
        Of course, given that sort of regex, I guess it doesn't matter whether you join the parts before or after the substitution. (It's using negative look-behind and negative look-ahead to check that a single upper-case letter is neither preceded nor following by a non-whitespace character, and in that case, put a period after the letter.)

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://614556]
Approved by GrandFather
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others taking refuge in the Monastery: (4)
As of 2024-04-25 09:17 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found