Beefy Boxes and Bandwidth Generously Provided by pair Networks
P is for Practical

Re: Split confusion

by davido (Cardinal)
on Jun 03, 2020 at 22:07 UTC ( #11117675=note: print w/replies, xml ) Need Help??

in reply to Split confusion

You can't really safely apply a set of capitalization rules to a person's name. You pretty much have to take it as they write it. And when you lose that information, you can't re-generate it. How to capitalize author names describes that de, d', van, and von may not be capitalized. So James Van Den Berghe could be spelled as I have done, or it could be James van den Berghe, or there could be some other magic combination. And some names defy all conventions.

For your problem statement I would do this:

#!/usr/bin/env perl use strict; use warnings; my @names = ( 'VAN DEN BERGHE', 'OSWALD', 'ANDERSON', 'LLOYD-WRIGHT', ); foreach my $name (@names) { my $altered = join('', map {ucfirst(lc($_))} split /(\s+|-)/, $nam +e); print "$name => $altered\n"; }

which produces:

VAN DEN BERGHE => Van Den Berghe OSWALD => Oswald ANDERSON => Anderson LLOYD-WRIGHT => Lloyd-Wright

But that doesn't make any attempt at dealing with the nuances discussed above.


Replies are listed 'Best First'.
Re^2: Split confusion
by swampyankee (Parson) on Jun 04, 2020 at 01:58 UTC

    The list I'm case-correcting is from the rosters for several high school classes a teach; there are few enough exceptions to either split/join or regex codes that editing the remaining one or two by hand is trivial.

    Information about American English usage here and here. Floating point issues? Please read this before posting. — emc

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://11117675]
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others pondering the Monastery: (9)
As of 2021-01-27 11:44 GMT
Find Nodes?
    Voting Booth?