Beefy Boxes and Bandwidth Generously Provided by pair Networks
"be consistent"

Name Parsing

by Adam (Vicar)
on Jan 03, 2001 at 01:01 UTC ( #49379=perlquestion: print w/replies, xml ) Need Help??

Adam has asked for the wisdom of the Perl Monks concerning the following question:

Normally I would just construct the regex my self, but this one seems a bit complex, and if someone else has already solved it, I'd rather go with a proven solution. That said, I need a regex (or whatever) to split up a name string into its first and last halfs. Any middle name/names would be considered part of the first name, as would titles like Junior, Senior, Doctor, etc. Worse -- since some last names have spaces in them, I can't just split on the last space.

Examples of name strings that are considered legit, I italized the last names for clarity:

  • Catherine Zeta-Jones
  • Jean Claude Van Damme
  • George W. Bush, Jr
  • Madonna
  • Randal L. Schwartz
  • joe schmoe
Is this hopeless? Is there a module? Is there even a good formula or algorithm? Should I just split on the (num spaces/2)+1 space?


Replies are listed 'Best First'.
Re: Name Parsing
by lhoward (Vicar) on Jan 03, 2001 at 01:06 UTC
    What you're looking for is Lingua::EN::NameParse. Parsing names is a very fuzzy business at best, but the Lingua::EN::NameParse module does a fairly good job of it. Don't expect the output to be perfect.
(Ovid) Re: Name Parsing
by Ovid (Cardinal) on Jan 03, 2001 at 01:06 UTC
    You don't want a regex for that! Despite my love for those magnificent beasts, they have their limitations and what you need requires properly parsing the data. Try Lingua::EN::NameParse.


    Join the Perlmonks Setiathome Group or just click on the the link and check out our stats.

Re: Name Parsing
by boo_radley (Parson) on Jan 03, 2001 at 01:04 UTC
    being the shameless CPAN searchmonkey that I am : Name Parser
($code or die) Re: Name Parsing - TPJ rules!
by $code or die (Deacon) on Jan 03, 2001 at 07:37 UTC
    The Liunga::EN::NameParse already mentioned. But if you can, check out the most recent Perl Journal. There was an article there written by someone who used that module to update and cross-reference employee databases (if my memory serves me correct). The entry's in the databases differed - sometimes entries had nicknames or initials instead of full first names. I will stop rabbiting on about this article because you'll enjoy it more if you read it yourself.

    Update: I should probably rename my nick to PerlJournalFAN because a growing %-age of my posts refer to it. It's great! Go Subscribe! Hard-copy and online versions available! YAH!

Log In?

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://49379]
Approved by root
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others exploiting the Monastery: (6)
As of 2022-08-09 02:05 GMT
Find Nodes?
    Voting Booth?

    No recent polls found