Cliques solution pertinent to my use case

This refers to the hardness of finding cliques in a graph when the number of nodes becomes high. My use case is that I am trying to form cliques within a cluster where the sum of relationships is maximum.

Background & definition (anthropomorphized for ease of understanding):

Cluster: A number of persons where each person is related to at least one another person. The strength of the relationship is a number, i.e. Si-j is the strength of the relationship between person i & person j (Symmetric: Si-j = Sj-i). All combinations need not be present - if person x & person y have no relationship then Sx-y does not exist. No relationship with self - Si-i does not exist as well.

Problem: Given a list of relationships Si-j: i from 1 to N, for each i some j from 1 to N, i!= j; (graph with the edges giving the strength of the relationship). Find those groups (cliques) where each member is connected to each other and the sum of the relationship is maximum, e.g. if the persons are as follows:

Person-1  Person-2  Relationship-strength
A         B         92
A         C         7
B         C         2
C         D         88
[download]

Then the groups formed would be (A,B) & (C,D) for a sum of relationships of 180 (92 + 88).

Any other combination would have a lesser sum, e.g. (A,B,C) & (D) would have a sum of relationships of 101 (92 + 7 + 2).

How I did it: Convert the list of Si-j's into an integer linear programming problem and call a free solver lp_solve. Got good results - out of about 40,000 problem sets (clusters) only about 20 timed out. This on an ancient 3'rd gen i7 laptop with 4GB RAM. About another 10 were "very" large (more than 2,000 edges). Gives me confidence that use of a commercial solver and a larger timeout period may give even better results.

Linear programming problem formulation left out here as it is more an algorithm issue rather than a Perl issue. We can discuss if interested. Off forum perhaps?

Hope this is interesting. And, perchance, if it is useful for anyone then it would be the icing on the cake!

Comment on Cliques solution pertinent to my use case Download Code

Replies are listed 'Best First'.
Re: Cliques solution pertinent to my use case by Fletch (Bishop) on Jul 07, 2021 at 16:30 UTC
I'd bet something like neo4j would be able to do this fairly quickly. Never actually used it but from (minimal) reading this type thing is right in its wheelhouse and it's got a REST api (and there's some promising hits on CPAN). The cake is a lie. The cake is a lie. The cake is a lie.	[reply]
Re^2: Cliques solution pertinent to my use case by Sanjay (Sexton) on Jul 15, 2021 at 15:43 UTC
Thanks for the tip. Will investigate after some exposure to neo4j.	[reply]
Re: Cliques solution pertinent to my use case by The Perlman (Scribe) on Jul 07, 2021 at 17:56 UTC
You asked this before in Sub set where all are connected . What's the difference now.? - Ron	[reply]
Re^2: Cliques solution pertinent to my use case by parv (Parson) on Jul 07, 2021 at 18:31 UTC
This time seems to be weight-y.	[reply]
Re^2: Cliques solution pertinent to my use case by Sanjay (Sexton) on Jul 15, 2021 at 15:39 UTC
It's the same. Just another way of looking at it to get some kind of solution.	[reply]
Re^3: Cliques solution pertinent to my use case by The Perlman (Scribe) on Jul 16, 2021 at 08:18 UTC
That's like crossposting and not really fair to us. - Ron	[reply]
Re: Cliques solution pertinent to my use case by tybalt89 (Monsignor) on Jul 08, 2021 at 23:00 UTC
This was fun :) First part is basically the same as Re^3: Sub set where all are connected and finds all the cliques. Last part looks for duplicated nodes, removes one of them from a clique such that the score is higher. #!/usr/bin/perl use strict; # https://perlmonks.org/?node_id=11109069 # clique use warnings; use List::Util qw( uniq sum ); my $edges = <<END =~ s/.Relationship.\n//gr; Person-1 Person-2 Relationship-strength A B 92 A C 7 B C 2 C D 88 END $edges =~ s/^(\w+)\s+(\w+)\b/join ' ', sort $1, $2/gem; # fix order print "$edges\n"; my %edges = map +( $_ => qr/(FAIL)/ ), map s/\s+/,/r, $edges =~ /^\w+\s+\w+/gm; my %cliques; my %seen; find( uniq sort $edges =~ /^(\w+)\s+(\w+)\b/gm ); # start with ever +y node sub find { $seen{ my $set = "@_" }++ and return; if( my @out = $set =~ /\b(\w+)\b.+\b(\w+)\b(??{ $edges{"$1,$2"} \|\| " +" })/ ) { for my $node ( @out ) # pair of unconnected nodes, try without eac +h one { @_ > 1 and find( grep $_ ne $node, @_ ); } } else { $cliques{ $set }++; # it is fully con +nected } } my $uniquecliques = ''; for ( sort { length $b <=> length $a } sort +uniq # ignore subsets keys %cliques, map tr/,/ /r, keys %edges ) { my $pattern = " $_ " =~ s/\w+/\\b$&\\b/gr =~ s/ /.?/gr; $uniquecliques =~ /^$pattern$/m or $uniquecliques .= "$_\n"; } print "cliques:\n\n$uniquecliques"; my %scorecache; sub scoreclique { $scorecache{ $_[0] } //= do { my $score = 0; "$_[0]:\n$edges" =~ /\b(\w+)\b.\b(\w+)\b.:.^\1\s+\2\s+(\d+)\b(? +{ $score += $3 })(FAIL)/ms; $score; } } my @cliques = split /\n/, $uniquecliques; my %count; $count{$_}++ for split ' ', $uniquecliques; for my $node ( map +($_) x --$count{$_}, keys %count ) # for all dups { my ($from, $to) = grep $cliques[$_] =~ /\b$node\b/, 0 .. $#cliques; my $fromscore = scoreclique($cliques[$from]) + scoreclique($cliques[$to] =~ s/\b$node\b//r); my $toscore = scoreclique($cliques[$to]) + scoreclique($cliques[$from] =~ s/\b$node\b//r); my $mod = $fromscore < $toscore ? $from : $to; # favor highest score $cliques[$mod] = join ' ', grep $_ ne $node, split ' ', $cliques[$mo +d]; } my $total = sum map scoreclique($_), @cliques; print "\nanswer($total):\n\n"; print "$_\n" for sort grep length, @cliques; __END__ [download] Outputs: `A B 92 A C 7 B C 2 C D 88 cliques: A B C C D answer(180): A B C D` [download] It passes all the test cases that have been provided :)	[reply] [d/l] [select]
Re^2: Cliques solution pertinent to my use case by Sanjay (Sexton) on Jul 15, 2021 at 15:46 UTC
Thanks. Hope to test this solution for large data sets soon.	[reply]


Do you know where your variables are?
	PerlMonks