Thanks for the response! Sorry for not including any code, I haven't even gotten that far yet.
Maybe I can try to better explain what I am doing if you're interested... The markers are actually genetic sequences (1-138k, yes/no for presence), the items are samples, and the sub-groups are animals. I'm using an R program that uses a gibbs sampler to look for the commonality between the know sub-groups and an unknown sample... The idea being, that you can identify proportions of the known sub-groups in the unknown sample.
I currently have a large library of known samples that correspond to various sub-groups of animals. But the 138k markers are causing the R script to bog down substantially. (4+ days per unknown due to single core limitations.) So I want to choose a subset of the 138k markers to run. Ideally this subset would have markers that are unique to each sub-group, but the "uniqueness" could be variable. As in, total list output per subgroup, and % unique from other subgroups. (By altering parameters, I would be able to request a list of 10k ID's from each subgroup that are 80% dissimilar from every other sub-group. Or a list of 5k that are 95% dissimilar...etc.) I definitely need to read up on statistics to figure out what I'm actually asking for!