Beefy Boxes and Bandwidth Generously Provided by pair Networks
XP is just a number
 
PerlMonks  

Re: running riot with an regx on surnames.

by trantor (Chaplain)
on Nov 27, 2001 at 14:46 UTC ( [id://127770]=note: print w/replies, xml ) Need Help??


in reply to running riot with an regx on surnames.

This is one of those situations where regular expressions can actually complicate things, why not use a simple split instead?

Also keep in mind that sometime first names can be double or contain non alphabetic characters, like Anne Marie or Anne-Marie (I've seen them both), and the first name is actually Anne Marie, Marie is not the middle.

Besides the middle name problem, that could be objectively hard to solve programmatically, there are several inconsistencies in your script:

$reporter !~ m/(\w+)\s*(.*)/

\s* matches zero or more blanks which is (I presume) not what you want, because you want to check that you have at least first name made by alphabetcs only (check the meaning of \w in perlre! it includes characters you don't want). For example, it happily accepts "name" when it should complain, looking for something like "name surname". Also, you my want to look at Death to Dot Star! by the excellent Ovid

foreach ($first,$last) { s/^\s+//; s/\s+$//; }

Unnecessary for the first name, because of the regexp you've used to capure it. Again, using split, this would be totally unnecessary for the last name as well, even if double.

if ($last =~ m/\d/g) { die "Surname should only contain letters, hyphens and apostrop +hes!"; }

The regexp and the error message state two different things. What you say in the error message (which is correct, I assume) would be a pattern like [^A-Za-z'-]. This is avery common mistake when deciding what's good and what's not, unfortunately it leads to so many security holes in programs. Please note that this is by no means complete, because for example it doesn't consider names with accented letters in them, like Björn.

elsif ($last =~ m/(\w+)\s*(\w+)/) { $surname = join(' ', $1, $3); }

Same note for \s*, also note that you're capturing in $2 and not $3

A starting point with split would be something like:

#!/usr/bin/perl -w use strict; my $n = 'Joe De Blogg'; my($name, $surname) = split /\s+/, $n, 2; print "name: $name surname: $surname\n";

-- TMTOWTDI

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://127770]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others musing on the Monastery: (9)
As of 2024-04-19 16:33 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found