http://qs321.pair.com?node_id=1131432


in reply to Re^3: Random shuffling
in thread Random shuffling

Hi Laurent and BrowserUK - thank you for your replies. I am not trying to match the sequences to the query myself, the software I am using does it and returns the score. And BrowserUK's claim that genomicists have no use of score is not true to say the least. While it may be true that for some tasks, it might become necessary from a practical point of view, to impose a threshold score, it doesn't mean scores do not matter. Quite the contrary.... prime case in point - the BLAST tool. OK, I am going on a tangent, so back to the question at hand:

BrowserUK, I am sure understand your point about randomness being randomness, so my bone of contention or argument is NOT about how many numbers of shuffles, because it is now clear to me that doing it multiple times is not better, but a waste of time, no argument there

I am getting more numbers of matches in the shuffled DNA sequences than in the intact DNA sequences! This is an absurd result, which can only mean that there is so much noise being picked up by the software and reported as a match from intact sequences. When the sequence is shuffled, there is more noise that is being generated due to the shuffle per se, and since the software does not do a good job of discrimination, it reports a higher number of matches! This is the only conclusion I can arrive at based on my discussion with you folks. DNA random shuffling that I am doing is not a problem, the approach the software used in separating signal from noise is not efficient enough, is what I am thinking....

Unless any of you folks have a different opinion about my conclusion regarding the software I am using, and why it is reported more number of matches despite DNA sequence random shuffling, I consider this matter closed. Thanks to all who participated in this discussion, I am grateful for your inputs, suggestions, thoughts and code. Cheers!