http://qs321.pair.com?node_id=127770


in reply to running riot with an regx on surnames.

This is one of those situations where regular expressions can actually complicate things, why not use a simple split instead?

Also keep in mind that sometime first names can be double or contain non alphabetic characters, like Anne Marie or Anne-Marie (I've seen them both), and the first name is actually Anne Marie, Marie is not the middle.

Besides the middle name problem, that could be objectively hard to solve programmatically, there are several inconsistencies in your script:

$reporter !~ m/(\w+)\s*(.*)/

\s* matches zero or more blanks which is (I presume) not what you want, because you want to check that you have at least first name made by alphabetcs only (check the meaning of \w in perlre! it includes characters you don't want). For example, it happily accepts "name" when it should complain, looking for something like "name surname". Also, you my want to look at Death to Dot Star! by the excellent Ovid

foreach ($first,$last) { s/^\s+//; s/\s+$//; }

Unnecessary for the first name, because of the regexp you've used to capure it. Again, using split, this would be totally unnecessary for the last name as well, even if double.

if ($last =~ m/\d/g) { die "Surname should only contain letters, hyphens and apostrop +hes!"; }

The regexp and the error message state two different things. What you say in the error message (which is correct, I assume) would be a pattern like [^A-Za-z'-]. This is avery common mistake when deciding what's good and what's not, unfortunately it leads to so many security holes in programs. Please note that this is by no means complete, because for example it doesn't consider names with accented letters in them, like Björn.

elsif ($last =~ m/(\w+)\s*(\w+)/) { $surname = join(' ', $1, $3); }

Same note for \s*, also note that you're capturing in $2 and not $3

A starting point with split would be something like:

#!/usr/bin/perl -w use strict; my $n = 'Joe De Blogg'; my($name, $surname) = split /\s+/, $n, 2; print "name: $name surname: $surname\n";

-- TMTOWTDI