Re: Parsing Names in a Text File
by davorg (Chancellor) on Jun 15, 2001 at 18:30 UTC
|
You're using the escape sequence \w to match characters
in the surname. \w matches the characters A-z, a-z, 0-9 and
the underscore (_). Your example contains a dash (-)
character, so you'll need to add that to the list of
allowed characters. Something like this will work:
s/^([A-Z])\w*( [-\w]+)$/$1.$2/g
--
<http://www.dave.org.uk>
Perl Training in the UK <http://www.iterative-software.com>
| [reply] [d/l] |
Re: Parsing Names in a Text File
by lemming (Priest) on Jun 15, 2001 at 18:33 UTC
|
Check Name Parsing from early this yeat for more on this
subject. It will give more cases to think about.
update:
I just looked at your strings a bit more closely.
Why don't you split on the "|" and get the second value?
($something, $name, $junk) = split(/\|/, $line, 3);
There are better ways of writing that split, but I'm running
on no sleep.
update on update: looks like more people
gave the split answer while I did that... | [reply] [d/l] |
Re: Parsing Names in a Text File
by enoch (Chaplain) on Jun 15, 2001 at 18:35 UTC
|
I would pass on using a regex for this one.
while(<FILE>)
{
@line = split '|'; # split line
$firstLetter = substr $line[1], 0, 1; #grab first letter
$lastName = (split(' ', $line[1]))[1]; #split the name entry on wh
+itespace and grab the last name portion
$name = $firstLetter . ". " . $lastName; #concatenate with period
+after first letter
}
Jeremy | [reply] [d/l] |
|
I'd recommend taking a slight variant on this in order to handle the middle name problem:
while(<FILE>)
{
my @line = split '|'; # split line
#so far as above apart from my declaration
#now split on whitespace
my @names = split ' ',$line[1];
#same idiom for getting the first letter
my $firstletter = substr ($names[0],0,1);
#then get the last item in the name array
my $lastname = $names[-1];
#now do whatever you want to do with the letter and lastname
}
Of course this assumes that all the names are in a givenname middlenames familyname format. | [reply] [d/l] |
Re: Parsing Names in a Text File
by Masem (Monsignor) on Jun 15, 2001 at 18:35 UTC
|
It's probably easily to use split since your file is nicely set up for that; Regex's aren't always the right cure for every problem.
my @names;
while (<FILE>) {
my ( $a, $name, @rest ) = split /\|/;
push @names, $name;
}
Dr. Michael K. Neylon - mneylon-pm@masemware.com
||
"You've left the lens cap of your mind on again, Pinky" - The Brain
| [reply] [d/l] |
Re: Parsing Names in a Text File
by runrig (Abbot) on Jun 15, 2001 at 18:51 UTC
|
Do it in two steps, it makes it much clearer than coming up with one regex to do everything: my $str = '380|Kelley Gibson-White|6|0|85|14.2|3|17|.176|8|8|1.000|';
# Get the name
my ($name) = (split /\|/, $str, 3)[1];
# Initialize the first name
$name =~ s/(\w)\S*\s+(.*)/$1. $2/;
print $name;
| [reply] [d/l] |
Re: Parsing Names in a Text File
by marvell (Pilgrim) on Jun 15, 2001 at 19:53 UTC
|
Look out for middle names and surnames that start with
"De", etc. They are easily confused. I bet there is a
CPAN module that does this, and I bet is has a list
of "probable" two word surname prefixes.
--
Brother Marvell
| [reply] |
|
It can be totally ambiguous, too. Ricky Van Shelten, the singer, has a FIRST NAME of "Ricky Van".
| [reply] |
Re: Parsing Names in a Text File
by mothra (Hermit) on Jun 15, 2001 at 20:08 UTC
|
open(CUST, "cust_info.txt");
while (<CUST>) { print ((split ('\|'))[1], "\n"); }
Update: My apologies...even after reading the question twice before answering, I still missed what you were trying to do. :)open(CUST, "cust_info.txt");
while (<CUST>) {
$full_name = (split '\|')[1];
$full_name =~ s/(\w)\w*\s+(.+$)/$1. $2/;
print $full_name, "\n";
}
is probably more what you're looking for. | [reply] [d/l] [select] |
Re: Parsing Names in a Text File
by Hofmator (Curate) on Jun 15, 2001 at 19:38 UTC
|
I like regexes :) and, well, they are not that
difficult to understand
use strict;
use warnings;
while (<DATA>) {
my $shortname = join '. ', /\|([A-Z])\S*\s+([^|]+)/;
print $shortname," or ";
# or if the rest of the line is to be left alone
s/(\|[A-Z])\S*\s+([^|]+)/$1. $2/;
print;
}
__DATA__
24|Janeth Arcain|6|6|217|36.2|51|106|.481|
321|Elen Chakirova|5|0|27|5.4|2|4|.500|
380|Kelley Gibson-White|6|0|85|14.2|3|17|.176|8|8|1.000|
Of course this puts still some constraints on the names,
e.g. two first names like in 'Johann Sebastian Bach' are
not allowed ... and probably a lot more special cases
-- Hofmator
| [reply] [d/l] |