I didn't fully understand your suggestion for how you might go about it, but below is the simplistic approach I'd use as a starting point. The key features of the scan are a) to build two data structures simultaneously: one to contain known sets of equivalences, and the second to contain the elements that those sets match; b) when new equivalences are found, the data structures for the equivalent sets are merged.
#!/usr/bin/perl -w
use strict;
my $data = read_input();
my $sets = scan($data);
for (@$sets) {
printf "{ %s }\n", join ' ', sort { $a <=> $b } @$_;
}
sub read_input {
my @data;
local $_;
while (<DATA>) {
push @data, [ grep defined, split /\s+/ ];
}
\@data;
}
sub scan {
my $data = shift;
my(%matches, %results);
for my $index (0 .. $#$data) {
my @equal;
my $these = $data->[$index];
for my $key (keys %matches) {
my $compare = $matches{$key};
if (grep exists $compare->{$_}, @$these) {
push @equal, $key;
}
}
$results{$index} = [ $index, map @{ delete $results{$_} }, @equal
+];
$matches{$index} = {
map(($_ => 1), @$these),
map %{ delete $matches{$_} }, @equal
};
}
[ values %results ];
}
__END__
a b c d e
f b g
h i j k l
m f
If this isn't fast enough, my first thought to improve it would be to find some way of using bit vectors to represent the elements, so that matches can be checked with a bitwise-and of two strings. To do that, you'd need to find a way to translate elements into numbers that you can use as a bit offset.
However, if there are lots of elements most of which appear only once, it may be better to do a prepass to get a list of repeated elements, and then consider only those repeats in the main loop.
Hope this helps,
Hugo
-
Are you posting in the right place? Check out Where do I post X? to know for sure.
-
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big>
<blockquote> <br /> <dd>
<dl> <dt> <em> <font>
<h1> <h2> <h3> <h4>
<h5> <h6> <hr /> <i>
<li> <nbsp> <ol> <p>
<small> <strike> <strong>
<sub> <sup> <table>
<td> <th> <tr> <tt>
<u> <ul>
-
Snippets of code should be wrapped in
<code> tags not
<pre> tags. In fact, <pre>
tags should generally be avoided. If they must
be used, extreme care should be
taken to ensure that their contents do not
have long lines (<70 chars), in order to prevent
horizontal scrolling (and possible janitor
intervention).
-
Want more info? How to link
or How to display code and escape characters
are good places to start.
|