Beefy Boxes and Bandwidth Generously Provided by pair Networks
Come for the quick hacks, stay for the epiphanies.
 
PerlMonks  

3D test data that exhibits clustering?

by BrowserUk (Patriarch)
on Jun 08, 2011 at 11:19 UTC ( [id://908684]=perlquestion: print w/replies, xml ) Need Help??

BrowserUk has asked for the wisdom of the Perl Monks concerning the following question:

I'm writing an algorithm to find clusters in 3D data, and I need a source of data to test it with.

Purely random datasets produce results that are impossible to say whether the cluster detection is good or bad. And every attempt I tried so far to generate "random clusters" hasn't really worked either.

So, if anyone knows of any freely available 3D datasets that are known to exhibit clustering? Or if you have thoughts on how to generate same?


Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.

Replies are listed 'Best First'.
Re: 3D test data that exhibits clustering?
by moritz (Cardinal) on Jun 08, 2011 at 11:32 UTC
    Or if you have thoughts on how to generate same?

    You can generate a few triples of random numbers and use them as centers for the new cluster. Then for each center, you can generate a random number of points that are close.

    For example you can use a gaussian distribution around the centers. For that you need normally distributed random numbers, which you can generate with the Box–Muller transform out of the uniformly distributed random numbers that perl's rand generates.

    Depending on the data you want to emulate, you might also want to add a number of totally random, non-clustered points.

Re: 3D test data that exhibits clustering?
by Eliya (Vicar) on Jun 08, 2011 at 11:41 UTC
    And every attempt I tried so far to generate "random clusters" hasn't really worked either.

    What exactly have you tried and why hasn't it worked?

    The approach moritz suggests (or variations thereof) seems to be rather straightforward, and given the brilliance of mind you've often exhibited here, I cannot believe you haven't already thought of it... :)   So, what's wrong with it?

      What exactly have you tried and why hasn't it worked?

      I tried generating random points around a set of random starting points, (without moritz' enhancement of a normal distribution around those starting points), but it generates sets like these. (Color coded for start point.)

      As you can see, you tend to either get very concentrated groupings very separate, or widely spread groups that almost entirely overlap. Neither is representative of the kind of plots you get from real datasets that exhibit clustering.

      moritz' enhancement might improve things somewhat--I'm trying it now--but if there were one or a few real datasets kicking around somewhere it would give me more confidence that I was performing a real test.


      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.
Re: 3D test data that exhibits clustering?
by salva (Canon) on Jun 08, 2011 at 15:47 UTC
    Some stellar database?

    For instance, a quick Google search reveals NOMAD.

      Nice idea, but there is no third dimension (distance) with astro data.


      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.

        Plenty of star databases include distance estimates. They're based on trigonometric parallax, absolute-brightness phenomenon, redshift, etc.

        Stellarium wouldn't be so much fun without it.

        --Daniel

        I am sure that some databases include the distance, though maybe not with the resolution you need.
Re: 3D test data that exhibits clustering?
by salva (Canon) on Jun 08, 2011 at 16:00 UTC
    Some algorithm based on the Artificial Termites?

    For instance, generate some random points in a 3D space. Then simulate some termites moving randomly over that space that can take a point when it is near enough and then leave it when there is another one near enough (maybe using some probability functions).

      At least in 2D, it can generate nice results. Red and green dots are the termites, blue dots are the wood (the clustered points).

      The program that generates it is available from Github here.

      update: now, also available from CPAN as AI::Termites.

        Why two colors of termite?

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://908684]
Approved by Corion
Front-paged by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others exploiting the Monastery: (3)
As of 2024-04-25 19:30 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found