Beefy Boxes and Bandwidth Generously Provided by pair Networks
Do you know where your variables are?
 
PerlMonks  

Cliques solution pertinent to my use case

by Sanjay (Sexton)
on Jul 07, 2021 at 15:57 UTC ( [id://11134769]=perlmeditation: print w/replies, xml ) Need Help??

This refers to the hardness of finding cliques in a graph when the number of nodes becomes high. My use case is that I am trying to form cliques within a cluster where the sum of relationships is maximum.

Background & definition (anthropomorphized for ease of understanding):

Cluster: A number of persons where each person is related to at least one another person. The strength of the relationship is a number, i.e. Si-j is the strength of the relationship between person i & person j (Symmetric: Si-j = Sj-i). All combinations need not be present - if person x & person y have no relationship then Sx-y does not exist. No relationship with self - Si-i does not exist as well.

Problem: Given a list of relationships Si-j: i from 1 to N, for each i some j from 1 to N, i!= j; (graph with the edges giving the strength of the relationship). Find those groups (cliques) where each member is connected to each other and the sum of the relationship is maximum, e.g. if the persons are as follows:

Person-1 Person-2 Relationship-strength A B 92 A C 7 B C 2 C D 88

Then the groups formed would be (A,B) & (C,D) for a sum of relationships of 180 (92 + 88).

Any other combination would have a lesser sum, e.g. (A,B,C) & (D) would have a sum of relationships of 101 (92 + 7 + 2).

How I did it: Convert the list of Si-j's into an integer linear programming problem and call a free solver lp_solve. Got good results - out of about 40,000 problem sets (clusters) only about 20 timed out. This on an ancient 3'rd gen i7 laptop with 4GB RAM. About another 10 were "very" large (more than 2,000 edges). Gives me confidence that use of a commercial solver and a larger timeout period may give even better results.

Linear programming problem formulation left out here as it is more an algorithm issue rather than a Perl issue. We can discuss if interested. Off forum perhaps?

Hope this is interesting. And, perchance, if it is useful for anyone then it would be the icing on the cake!

Replies are listed 'Best First'.
Re: Cliques solution pertinent to my use case
by Fletch (Bishop) on Jul 07, 2021 at 16:30 UTC

    I'd bet something like neo4j would be able to do this fairly quickly. Never actually used it but from (minimal) reading this type thing is right in its wheelhouse and it's got a REST api (and there's some promising hits on CPAN).

    The cake is a lie.
    The cake is a lie.
    The cake is a lie.

      Thanks for the tip. Will investigate after some exposure to neo4j.

Re: Cliques solution pertinent to my use case
by The Perlman (Scribe) on Jul 07, 2021 at 17:56 UTC

      This time seems to be weight-y.

      It's the same. Just another way of looking at it to get some kind of solution.

        That's like crossposting and not really fair to us.
        - Ron
Re: Cliques solution pertinent to my use case
by tybalt89 (Monsignor) on Jul 08, 2021 at 23:00 UTC

    This was fun :)

    First part is basically the same as Re^3: Sub set where all are connected and finds all the cliques.

    Last part looks for duplicated nodes, removes one of them from a clique such that the score is higher.

    #!/usr/bin/perl use strict; # https://perlmonks.org/?node_id=11109069 # clique use warnings; use List::Util qw( uniq sum ); my $edges = <<END =~ s/.*Relationship.*\n//gr; Person-1 Person-2 Relationship-strength A B 92 A C 7 B C 2 C D 88 END $edges =~ s/^(\w+)\s+(\w+)\b/join ' ', sort $1, $2/gem; # fix order print "$edges\n"; my %edges = map +( $_ => qr/(*FAIL)/ ), map s/\s+/,/r, $edges =~ /^\w+\s+\w+/gm; my %cliques; my %seen; find( uniq sort $edges =~ /^(\w+)\s+(\w+)\b/gm ); # start with ever +y node sub find { $seen{ my $set = "@_" }++ and return; if( my @out = $set =~ /\b(\w+)\b.+\b(\w+)\b(??{ $edges{"$1,$2"} || " +" })/ ) { for my $node ( @out ) # pair of unconnected nodes, try without eac +h one { @_ > 1 and find( grep $_ ne $node, @_ ); } } else { $cliques{ $set }++; # it is fully con +nected } } my $uniquecliques = ''; for ( sort { length $b <=> length $a } sort +uniq # ignore subsets keys %cliques, map tr/,/ /r, keys %edges ) { my $pattern = " $_ " =~ s/\w+/\\b$&\\b/gr =~ s/ /.*?/gr; $uniquecliques =~ /^$pattern$/m or $uniquecliques .= "$_\n"; } print "cliques:\n\n$uniquecliques"; my %scorecache; sub scoreclique { $scorecache{ $_[0] } //= do { my $score = 0; "$_[0]:\n$edges" =~ /\b(\w+)\b.*\b(\w+)\b.*:.*^\1\s+\2\s+(\d+)\b(? +{ $score += $3 })(*FAIL)/ms; $score; } } my @cliques = split /\n/, $uniquecliques; my %count; $count{$_}++ for split ' ', $uniquecliques; for my $node ( map +($_) x --$count{$_}, keys %count ) # for all dups { my ($from, $to) = grep $cliques[$_] =~ /\b$node\b/, 0 .. $#cliques; my $fromscore = scoreclique($cliques[$from]) + scoreclique($cliques[$to] =~ s/\b$node\b//r); my $toscore = scoreclique($cliques[$to]) + scoreclique($cliques[$from] =~ s/\b$node\b//r); my $mod = $fromscore < $toscore ? $from : $to; # favor highest score $cliques[$mod] = join ' ', grep $_ ne $node, split ' ', $cliques[$mo +d]; } my $total = sum map scoreclique($_), @cliques; print "\nanswer($total):\n\n"; print "$_\n" for sort grep length, @cliques; __END__

    Outputs:

    A B 92 A C 7 B C 2 C D 88 cliques: A B C C D answer(180): A B C D

    It passes all the test cases that have been provided :)

      Thanks. Hope to test this solution for large data sets soon.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlmeditation [id://11134769]
Approved by GrandFather
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others imbibing at the Monastery: (6)
As of 2024-04-23 15:42 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found