Re^4: Finding Combinations of Pairs

andye,
like this solution, because it's so simple and efficient:

It has at least one bug and I am not sure why you think it is efficient.

First the bug. You have /$uniq[$i]/. This can do the wrong thing for at least two reasons. The first is that it will match "and" in the line of "where is the sand" even though the word "and" never appears. The second reason is that you haven't use \Q\E or quotemeta to ensure any metacharacters are escaped. It doesn't take case into consideration either, but I am not sure any of the solutions do (including my own).

Another potential bug is the fact that you don't consider that in the line "hello world goodbye cruel world" that each pairing of world should double and not be counted once. I am not sure if this is or isn't a requirement but it is a bug in one intepretation.

As to why it isn't efficient. This can be done in a single pass and yet your solution has you going through every line in the data (N^2 - N) / 2 times (where N = # of unique words). In other words, you check every single pairing of unique word against every single line even if the word doesn't appear on that line.

Cheers - L~R

Comment on Re^4: Finding Combinations of Pairs Download Code

Replies are listed 'Best First'.
Re^5: Finding Combinations of Pairs by tilly (Archbishop) on Jan 14, 2009 at 20:54 UTC
In the context of a discussion about providing obfuscated and inefficient answers that look reasonable, I assumed that that post was a well-executed joke.	[reply]
Re^6: Finding Combinations of Pairs by Limbic~Region (Chancellor) on Jan 14, 2009 at 21:26 UTC
tilly, My funny bone is often broke. In spoken conversation I have learned to pick up on many social cues when to not take something literally but it is an effort and I commonly get it wrong. Written communication is so much worse. Thanks for the pointer. Cheers - L~R	[reply]
Re^7: Finding Combinations of Pairs by andye (Curate) on Jan 14, 2009 at 23:04 UTC
No worries, LR. ;) The key issue that I was pround of - on the efficiency stakes - was that it checked for all possible combinations of keywords - even those which did not exist on the same line ever. And that it went through the entire file again for every keyword-pair, i.e. processed the entire file keyword-factorial times. Now that's what I call thorough. The sand/and thing is a genuine, unintended bug though. :) All the best, andy. (BTW, 'NB see parent node' was intended as a nod/wink - perhaps kinda unclear though).	[reply]


Clear questions and runnable code get the best and fastest answer
	PerlMonks