Maybe it hasn't been written because the code is fairly trivial, but the details are highly application specific?
Something like the following would seem to be what you need to do the bucketing:
use strict;
use warnings;
my %buckets;
my $lineStart = tell DATA;
while (<DATA>) {
chomp;
next unless length;
push @{$buckets{lc substr $_, 0, 1}}, [$lineStart, tell DATA];
$lineStart = tell DATA;
}
for my $key (sort keys %buckets) {
my @pairs = map {"@$_"} @{$buckets{$key}};
print "$key: ", (join ', ', @pairs), "\n";
}
__DATA__
Ok, let's say file A has a series of strings, one per line. Let's say
+that file
B has a series of strings, one per line.
The goal is, for each line in A, to return the best match from B using
+ a
subroutine named fuzzy_match, a function that takes two strings and re
+turns a
float from 0 to 1.
Now, let's assume that file B is enormous, making the prospect of appl
+ying
fuzzy_match to each member infeasible. But let's also assume that the
+first
character of each member of B will always be the best result from fuzz
+y_match
for A. This means that instead of looking through all of B, you simply
+ need to
retrieve all records from B which start with the same first letter as
+the
current record in A.
Prints:
b: 458 499
c: 822 900, 1053 1074
f: 651 670, 746 822, 900 979
n: 670 746
o: 378 458
r: 979 1053
s: 573 651
t: 499 573
Perl is environmentally friendly - it saves trees
-
Are you posting in the right place? Check out Where do I post X? to know for sure.
-
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big>
<blockquote> <br /> <dd>
<dl> <dt> <em> <font>
<h1> <h2> <h3> <h4>
<h5> <h6> <hr /> <i>
<li> <nbsp> <ol> <p>
<small> <strike> <strong>
<sub> <sup> <table>
<td> <th> <tr> <tt>
<u> <ul>
-
Snippets of code should be wrapped in
<code> tags not
<pre> tags. In fact, <pre>
tags should generally be avoided. If they must
be used, extreme care should be
taken to ensure that their contents do not
have long lines (<70 chars), in order to prevent
horizontal scrolling (and possible janitor
intervention).
-
Want more info? How to link
or How to display code and escape characters
are good places to start.