Beefy Boxes and Bandwidth Generously Provided by pair Networks
P is for Practical
 
PerlMonks  

comment on

( [id://3333]=superdoc: print w/replies, xml ) Need Help??

I didn't fully understand your suggestion for how you might go about it, but below is the simplistic approach I'd use as a starting point. The key features of the scan are a) to build two data structures simultaneously: one to contain known sets of equivalences, and the second to contain the elements that those sets match; b) when new equivalences are found, the data structures for the equivalent sets are merged.

#!/usr/bin/perl -w use strict; my $data = read_input(); my $sets = scan($data); for (@$sets) { printf "{ %s }\n", join ' ', sort { $a <=> $b } @$_; } sub read_input { my @data; local $_; while (<DATA>) { push @data, [ grep defined, split /\s+/ ]; } \@data; } sub scan { my $data = shift; my(%matches, %results); for my $index (0 .. $#$data) { my @equal; my $these = $data->[$index]; for my $key (keys %matches) { my $compare = $matches{$key}; if (grep exists $compare->{$_}, @$these) { push @equal, $key; } } $results{$index} = [ $index, map @{ delete $results{$_} }, @equal +]; $matches{$index} = { map(($_ => 1), @$these), map %{ delete $matches{$_} }, @equal }; } [ values %results ]; } __END__ a b c d e f b g h i j k l m f

If this isn't fast enough, my first thought to improve it would be to find some way of using bit vectors to represent the elements, so that matches can be checked with a bitwise-and of two strings. To do that, you'd need to find a way to translate elements into numbers that you can use as a bit offset.

However, if there are lots of elements most of which appear only once, it may be better to do a prepass to get a list of repeated elements, and then consider only those repeats in the main loop.

Hope this helps,

Hugo


In reply to Re: Building Networks of Matches by hv
in thread Building Networks of Matches by bowsie

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Are you posting in the right place? Check out Where do I post X? to know for sure.
  • Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
    <code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
  • Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
  • Want more info? How to link or How to display code and escape characters are good places to start.
Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others admiring the Monastery: (8)
As of 2024-03-28 15:06 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found