As an exercise, I wrote a new comparison routine for use with
sort. It splits each string to be compared into parts and if two parts are both numbers, they will be compared numerically. Whitespace is ignored. A shorter string is considered smaller than a longer (that is, "abc" is smaller than "abcde" ).
sub dictcmp
{
my $ac = $a;
my $bc = $b;
while( 1 ) {
my @a = $ac =~ /^\s*(([A-Za-z]+)|(\d+))(.*)/;
my @b = $bc =~ /^\s*(([A-Za-z]+)|(\d+))(.*)/;
return 0 if !defined $a[0] & !defined $b[0];
return -1 if !defined $a[0];
return 1 if !defined $b[0];
my $res;
if( $a[0] =~ /\d+/ && $b[0] =~ /\d+/) {
$res = $a[0] <=> $b[0];
}
else {
$res = $a[0] cmp $b[0];
}
return $res if $res;
$ac = $a[3];
$bc = $b[3];
}
}
Calling this routine optimized would be a blatant lie, and I'm sure a lot of things can be improved. It does work, however, at least for simple input data:
my @list = ( "x10 y", "1abc", "a10y", "x9y", " b1" );
print join "\n", sort dictcmp @list;
yields
1abc
a10y
b1
x9y
x10 y
Cheers,
--Moodster