Beefy Boxes and Bandwidth Generously Provided by pair Networks
good chemistry is complicated,
and a little bit messy -LW
 
PerlMonks  

comment on

( [id://3333] : superdoc . print w/replies, xml ) Need Help??

This is one of those situations where regular expressions can actually complicate things, why not use a simple split instead?

Also keep in mind that sometime first names can be double or contain non alphabetic characters, like Anne Marie or Anne-Marie (I've seen them both), and the first name is actually Anne Marie, Marie is not the middle.

Besides the middle name problem, that could be objectively hard to solve programmatically, there are several inconsistencies in your script:

$reporter !~ m/(\w+)\s*(.*)/

\s* matches zero or more blanks which is (I presume) not what you want, because you want to check that you have at least first name made by alphabetcs only (check the meaning of \w in perlre! it includes characters you don't want). For example, it happily accepts "name" when it should complain, looking for something like "name surname". Also, you my want to look at Death to Dot Star! by the excellent Ovid

foreach ($first,$last) { s/^\s+//; s/\s+$//; }

Unnecessary for the first name, because of the regexp you've used to capure it. Again, using split, this would be totally unnecessary for the last name as well, even if double.

if ($last =~ m/\d/g) { die "Surname should only contain letters, hyphens and apostrop +hes!"; }

The regexp and the error message state two different things. What you say in the error message (which is correct, I assume) would be a pattern like [^A-Za-z'-]. This is avery common mistake when deciding what's good and what's not, unfortunately it leads to so many security holes in programs. Please note that this is by no means complete, because for example it doesn't consider names with accented letters in them, like Björn.

elsif ($last =~ m/(\w+)\s*(\w+)/) { $surname = join(' ', $1, $3); }

Same note for \s*, also note that you're capturing in $2 and not $3

A starting point with split would be something like:

#!/usr/bin/perl -w use strict; my $n = 'Joe De Blogg'; my($name, $surname) = split /\s+/, $n, 2; print "name: $name surname: $surname\n";

-- TMTOWTDI


In reply to Re: running riot with an regx on surnames. by trantor
in thread running riot with an regx on surnames. by maderman

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Are you posting in the right place? Check out Where do I post X? to know for sure.
  • Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
    <code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
  • Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
  • Want more info? How to link or How to display code and escape characters are good places to start.