Parsing Names in a Text File

Perl Newby has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Parsing Names in a Text File by davorg (Chancellor) on Jun 15, 2001 at 18:30 UTC
You're using the escape sequence \w to match characters in the surname. \w matches the characters A-z, a-z, 0-9 and the underscore (_). Your example contains a dash (-) character, so you'll need to add that to the list of allowed characters. Something like this will work: `s/^([A-Z])\w*( [-\w]+)$/$1.$2/g` -- <http://www.dave.org.uk> Perl Training in the UK <http://www.iterative-software.com>	[reply] [d/l]
Re: Parsing Names in a Text File by lemming (Priest) on Jun 15, 2001 at 18:33 UTC
Check Name Parsing from early this yeat for more on this subject. It will give more cases to think about. update: I just looked at your strings a bit more closely. Why don't you split on the "\|" and get the second value? `($something, $name, $junk) = split(/\\|/, $line, 3);` [download] There are better ways of writing that split, but I'm running on no sleep. update on update: looks like more people gave the split answer while I did that...	[reply] [d/l]
Re: Parsing Names in a Text File by enoch (Chaplain) on Jun 15, 2001 at 18:35 UTC
I would pass on using a regex for this one. `while(<FILE>) { @line = split '\|'; # split line $firstLetter = substr $line[1], 0, 1; #grab first letter $lastName = (split(' ', $line[1]))[1]; #split the name entry on wh +itespace and grab the last name portion $name = $firstLetter . ". " . $lastName; #concatenate with period +after first letter }` [download] Jeremy	[reply] [d/l]
Re: Re: Parsing Names in a Text File by Anonymous Monk on Jun 15, 2001 at 20:41 UTC
I'd recommend taking a slight variant on this in order to handle the middle name problem: `while(<FILE>) { my @line = split '\|'; # split line #so far as above apart from my declaration #now split on whitespace my @names = split ' ',$line[1]; #same idiom for getting the first letter my $firstletter = substr ($names[0],0,1); #then get the last item in the name array my $lastname = $names[-1]; #now do whatever you want to do with the letter and lastname }` [download] Of course this assumes that all the names are in a givenname middlenames familyname format.	[reply] [d/l]
Re: Parsing Names in a Text File by Masem (Monsignor) on Jun 15, 2001 at 18:35 UTC
It's probably easily to use split since your file is nicely set up for that; Regex's aren't always the right cure for every problem. `my @names; while (<FILE>) { my ( $a, $name, @rest ) = split /\\|/; push @names, $name; }` [download] Dr. Michael K. Neylon - mneylon-pm@masemware.com \|\| "You've left the lens cap of your mind on again, Pinky" - The Brain	[reply] [d/l]
Re: Parsing Names in a Text File by runrig (Abbot) on Jun 15, 2001 at 18:51 UTC
Do it in two steps, it makes it much clearer than coming up with one regex to do everything: `my $str = '380\|Kelley Gibson-White\|6\|0\|85\|14.2\|3\|17\|.176\|8\|8\|1.000\|'; # Get the name my ($name) = (split /\\|/, $str, 3)[1]; # Initialize the first name $name =~ s/(\w)\S\s+(.)/$1. $2/; print $name;` [download]	[reply] [d/l]
Re: Parsing Names in a Text File by marvell (Pilgrim) on Jun 15, 2001 at 19:53 UTC
Look out for middle names and surnames that start with "De", etc. They are easily confused. I bet there is a CPAN module that does this, and I bet is has a list of "probable" two word surname prefixes. -- Brother Marvell	[reply]
Re: Re: Parsing Names in a Text File by John M. Dlugosz (Monsignor) on Jun 16, 2001 at 02:52 UTC
It can be totally ambiguous, too. Ricky Van Shelten, the singer, has a FIRST NAME of "Ricky Van".	[reply]
Re: Parsing Names in a Text File by mothra (Hermit) on Jun 15, 2001 at 20:08 UTC
Easy enough: `open(CUST, "cust_info.txt"); while (<CUST>) { print ((split ('\\|'))[1], "\n"); }` [download] Update: My apologies...even after reading the question twice before answering, I still missed what you were trying to do. :) `open(CUST, "cust_info.txt"); while (<CUST>) { $full_name = (split '\\|')[1]; $full_name =~ s/(\w)\w*\s+(.+$)/$1. $2/; print $full_name, "\n"; }` [download] is probably more what you're looking for.	[reply] [d/l] [select]
Re: Parsing Names in a Text File by Hofmator (Curate) on Jun 15, 2001 at 19:38 UTC
I like regexes :) and, well, they are not that difficult to understand `use strict; use warnings; while (<DATA>) { my $shortname = join '. ', /\\|([A-Z])\S\s+([^\|]+)/; print $shortname," or "; # or if the rest of the line is to be left alone s/(\\|[A-Z])\S\s+([^\|]+)/$1. $2/; print; } __DATA__ 24\|Janeth Arcain\|6\|6\|217\|36.2\|51\|106\|.481\| 321\|Elen Chakirova\|5\|0\|27\|5.4\|2\|4\|.500\| 380\|Kelley Gibson-White\|6\|0\|85\|14.2\|3\|17\|.176\|8\|8\|1.000\|` [download] Of course this puts still some constraints on the names, e.g. two first names like in 'Johann Sebastian Bach' are not allowed ... and probably a lot more special cases -- Hofmator	[reply] [d/l]


Keep It Simple, Stupid
	PerlMonks