Beefy Boxes and Bandwidth Generously Provided by pair Networks
laziness, impatience, and hubris
 
PerlMonks  

Re: Finding Combinations of Pairs

by kyle (Abbot)
on Jan 14, 2009 at 03:46 UTC ( [id://736136]=note: print w/replies, xml ) Need Help??


in reply to Finding Combinations of Pairs

I think your spidey sense serves you well.

This smells a little like homework, so I'm putting my solution in spoiler tags.

You should come up with your pairs, and for each pair keep a count (in a hash) of how many times that pair has been seen. Each key of the hash will be a pair, and each value will be the count of occurrences. Once you have that, you can sort the keys by the values and come up with the pair that appears the most. I suspect every one of these tasks has been the subject of some post you could find with Super Search.

Replies are listed 'Best First'.
Re^2: Finding Combinations of Pairs
by tilly (Archbishop) on Jan 14, 2009 at 05:15 UTC
    Tradition says that when you answer homework you obfuscate your answer somewhat so that it won't get a good mark. Like this:
    #! /usr/bin/perl -w use strict; my %pair_count; for (<>) { $pair_count{$_}++ for get_pairs(grep length $_, split /\s/, $_); } my $comp = sub ($$) {-($pair_count{$_[0]} <=> $pair_count{$_[1]})}; print((sort $comp keys %pair_count)[0], $/); sub get_range_iterator { my ($start, $end) = @_; return sub { if ($start <= $end) { return $start++; } else { return; } } } sub get_pairs { my $outer_iter = get_range_iterator(0, $#_); my $i = $outer_iter->(); my @answer; while (defined($i)) { my $inner_iter = get_range_iterator($i+1, $#_); my $j = $inner_iter->(); while (defined $j) { push @answer, join " ", sort @_[$i, $j]; $j = $inner_iter->(); } $i = $outer_iter->(); } return @answer; }
      ...when you answer homework...

      Or else you offer an answer that appears to be well laid out and straight-forward, but scales abysmally...

      • another intruder with the mooring in the heart of the Perl

      tilly,
      Tradition says that when you answer homework you obfuscate your answer somewhat so that it won't get a good mark.

      It doesn't look too much like homework to me but another idea would be to use code obviously beyond that of the course?

      Cheers - L~R

      I like this solution, because it's so simple and efficient:

      #!/usr/bin/perl -w use strict; my @lines = <DATA>; my (%words, %count); @words{map {split ' '} @lines} =1; my @uniq = keys %words; # create list of unique words foreach my $i (0 .. $#uniq - 1) { foreach my $j ($i+1 .. $#uniq) { # count the pairs do { $count{"$uniq[$i] $uniq[$j]"}++ if /$uniq[$i]/ an +d /$uniq[$j]/ } foreach @lines; } } print map "$_ : $count{$_}\n", # print the pairs in s +orted order sort {$count{$a} <=> $count{$b}} keys %count; __DATA__ dog monkey cat cat ball stone monkey iron cat zoo

      (NB see parent node)

        andye,
        like this solution, because it's so simple and efficient:

        It has at least one bug and I am not sure why you think it is efficient.

        First the bug. You have /$uniq[$i]/. This can do the wrong thing for at least two reasons. The first is that it will match "and" in the line of "where is the sand" even though the word "and" never appears. The second reason is that you haven't use \Q\E or quotemeta to ensure any metacharacters are escaped. It doesn't take case into consideration either, but I am not sure any of the solutions do (including my own).

        Another potential bug is the fact that you don't consider that in the line "hello world goodbye cruel world" that each pairing of world should double and not be counted once. I am not sure if this is or isn't a requirement but it is a bug in one intepretation.

        As to why it isn't efficient. This can be done in a single pass and yet your solution has you going through every line in the data (N^2 - N) / 2 times (where N = # of unique words). In other words, you check every single pairing of unique word against every single line even if the word doesn't appear on that line.

        Cheers - L~R

Re^2: Finding Combinations of Pairs
by zod (Scribe) on Jan 14, 2009 at 07:23 UTC
    Thanks for the reply. I didn't consider combining the pairs into a single value. Next time I will. Thanks.
      BTW - and being serious now - you don't actually need to combine them into a single value.

      If you do something like:

      $h->{$word1}->{$word2}++
      instead of
      $h->{"$word1 $word2"}++
      in the centre of the loops, then you end up with a useful tree-like structure, which gives you a quick index of, for each word, how many times each other word appears in combination with it.

      Try it and use Data::Dumper to print the output, and you'll see what I mean.

      Although I appreciate that may not be what the homework called for. <grin>

      Best wishes, andy.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://736136]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others wandering the Monastery: (9)
As of 2024-03-28 10:13 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found