http://qs321.pair.com?node_id=11117675


in reply to Split confusion

You can't really safely apply a set of capitalization rules to a person's name. You pretty much have to take it as they write it. And when you lose that information, you can't re-generate it. How to capitalize author names describes that de, d', van, and von may not be capitalized. So James Van Den Berghe could be spelled as I have done, or it could be James van den Berghe, or there could be some other magic combination. And some names defy all conventions.

For your problem statement I would do this:

#!/usr/bin/env perl use strict; use warnings; my @names = ( 'VAN DEN BERGHE', 'OSWALD', 'ANDERSON', 'LLOYD-WRIGHT', ); foreach my $name (@names) { my $altered = join('', map {ucfirst(lc($_))} split /(\s+|-)/, $nam +e); print "$name => $altered\n"; }

which produces:

VAN DEN BERGHE => Van Den Berghe OSWALD => Oswald ANDERSON => Anderson LLOYD-WRIGHT => Lloyd-Wright

But that doesn't make any attempt at dealing with the nuances discussed above.


Dave

Replies are listed 'Best First'.
Re^2: Split confusion
by swampyankee (Parson) on Jun 04, 2020 at 01:58 UTC

    The list I'm case-correcting is from the rosters for several high school classes a teach; there are few enough exceptions to either split/join or regex codes that editing the remaining one or two by hand is trivial.


    Information about American English usage here and here. Floating point issues? Please read this before posting. — emc