note
nobull
If the input data is known to be ordered so that duplicates are always adjacent then the problem simplifies to:
<code>
use strict;
use warnings;
my $seen = '';
while(<DATA>) {
my ($name, $phoneno, $address, $date, $salary) = split(/:/);
next if $seen eq $name;
$seen = $name;
$salary += ($salary*10)/100;
print "$name:$phoneno:$address:$date:$salary\n";
}
__DATA__
Barbara Kerz:385-573-8326:832 Ponce Drive, Gary, IN 83756:12/15/46:268500
Barbara Kerz:385-573-8326:832 Ponce Drive, Gary, IN 83756:12/15/46:268500
Barbara Kerz:385-573-8326:832 Ponce Drive, Gary, IN 83756:12/15/46:268500
Norma Corder:397-857-2735:74 Pine Street, Dearborn, MI 23874:3/28/45:245700
Norma Corder:397-857-2735:74 Pine Street, Dearborn, MI 23874:3/28/45:245700
Tommy Savage:408-724-0140:1222 Oxbow Court, Sunnyvale,CA 94087:5/19/66:34200
Tommy Savage:408-724-0140:1222 Oxbow Court, Sunnyvale,CA 94087:5/19/66:34200
Lesle Kerstin:408-456-1234:4 Harvard Square, Boston, MA 02133:4/22/62:52600
JonDeLoach:408-253-3122:123 Park St., San Jose, CA 94086:7/25/53:85100
</code>
<p>When dealing with very large data sets if can make sense to use a highly optimised external sort tool such as <a href="http://www.gnu.org/software/coreutils/manual/html_node/sort-invocation.html#sort-invocation" >GNU sort</a> to put the data into an order that allows you to process it with <a href="http://en.wikipedia.org/wiki/Big_O_notation"> O(1) </a> memory usage. In this case that is a simple sort.
<p>For smaller data sets stick with the usual hash approach. If you happen to <em>know</em> that the data will be sorted <em>anyhow</em> then you can use the hashless approach for smaller data but it is probably not worth it.
<p>There is also the option of using Perl's <a href="http://perldoc.perl.org/functions/sort.html"> sort </a> but this is usually not a good option.
714116
714116