http://qs321.pair.com?node_id=49379

Adam has asked for the wisdom of the Perl Monks concerning the following question:

Normally I would just construct the regex my self, but this one seems a bit complex, and if someone else has already solved it, I'd rather go with a proven solution. That said, I need a regex (or whatever) to split up a name string into its first and last halfs. Any middle name/names would be considered part of the first name, as would titles like Junior, Senior, Doctor, etc. Worse -- since some last names have spaces in them, I can't just split on the last space.

Examples of name strings that are considered legit, I italized the last names for clarity:

Is this hopeless? Is there a module? Is there even a good formula or algorithm? Should I just split on the (num spaces/2)+1 space?

Thanks!

Replies are listed 'Best First'.
Re: Name Parsing
by lhoward (Vicar) on Jan 03, 2001 at 01:06 UTC
    What you're looking for is Lingua::EN::NameParse. Parsing names is a very fuzzy business at best, but the Lingua::EN::NameParse module does a fairly good job of it. Don't expect the output to be perfect.
(Ovid) Re: Name Parsing
by Ovid (Cardinal) on Jan 03, 2001 at 01:06 UTC
    You don't want a regex for that! Despite my love for those magnificent beasts, they have their limitations and what you need requires properly parsing the data. Try Lingua::EN::NameParse.

    Cheers,
    Ovid

    Join the Perlmonks Setiathome Group or just click on the the link and check out our stats.

Re: Name Parsing
by boo_radley (Parson) on Jan 03, 2001 at 01:04 UTC
    being the shameless CPAN searchmonkey that I am : Name Parser
($code or die) Re: Name Parsing - TPJ rules!
by $code or die (Deacon) on Jan 03, 2001 at 07:37 UTC
    The Liunga::EN::NameParse already mentioned. But if you can, check out the most recent Perl Journal. There was an article there written by someone who used that module to update and cross-reference employee databases (if my memory serves me correct). The entry's in the databases differed - sometimes entries had nicknames or initials instead of full first names. I will stop rabbiting on about this article because you'll enjoy it more if you read it yourself.

    Update: I should probably rename my nick to PerlJournalFAN because a growing %-age of my posts refer to it. It's great! Go Subscribe! Hard-copy and online versions available! YAH!