http://qs321.pair.com?node_id=911999


in reply to Virus toy model^2

Do you care about the individual nodes, or just each current genome sequence (not a geneticist, so pardon any terminology mess-ups) and the number of each in existence?

You are dealing with large numbers of values at the mutation rates and generations in your example. As was previously mentioned, 5**1000 is quite large. Is there any dieoff rate or fitness test at each generation that could be used to prune the dataset?

Perhaps a more manageable structure (although not at these rates and generations) would be a Hash or HoH (hash-of-hashes), where each key is the ACTG sequence, and the value is a hash with the current number of virii with that sequence, as well as some other bookkeeping information. If you don't need other bookkeeping information, perhaps a simple hash would do. Worst storage case on this (if I am thinking through this correctly) should be no worse (given a scaling factor) than an array of all of the virii (not that this value is small). If each mutation must work in groups of 3 ACGT values, perhaps you could even store your string as a bitstring of some sort (4**3 == 64).

Death of a virus would be a -- operation, and a successful mutation would be a ++ operation.

--MidLifeXis