Beefy Boxes and Bandwidth Generously Provided by pair Networks
Syntactic Confectionery Delight
 
PerlMonks  

Re^2: Randomly biased, random numbers.

by BrowserUk (Patriarch)
on Dec 06, 2013 at 02:32 UTC ( [id://1065881]=note: print w/replies, xml ) Need Help??


in reply to Re: Randomly biased, random numbers.
in thread Randomly biased, random numbers.

You might want to be more explicit about exactly what characteristics you want in this distribution,

That's hard. Mostly because I definitely do not want any formally defined distribution. Almost exactly the opposite in fact.

The problem with "random" is that on average, the points will be evenly distributed over the range (2D plane in this case). That's almost the exact definition of PRNG.

Whilst with enough iterations, all possible distributions -- including those where a majority of the points in the sample size tend to be grouped or clustered on one side or in one corner of the plane -- with even a relatively small plane, (500,500) and sample size 100 -- there are so many 'roughly even' distributions and so few 'lopsided' distributions, that I'd need to run billions of sets to ensure I'd tested a few of the lopsided ones. That's not practical.

So, I'm looking for a way to induce lopsided -- which pretty much means not 'roughly evenly distributed' -- distributions, without prescribing where or why the concentrated and sparse bits will be.

I can't think of any better description than: I want lopsided distributions that are completely randomly generated.

Not good I know, but its the best I've got.


With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.

Replies are listed 'Best First'.
Re^3: Randomly biased, random numbers.
by educated_foo (Vicar) on Dec 06, 2013 at 03:19 UTC
    I definitely do not want any formally defined distribution.
    Do you have sample data you can perturb, mix, or otherwise use to generate test data?
    I want lopsided distributions that are completely randomly generated.
    Hm... Maybe transform your PRNG through a random, monotonic, nonlinear mapping? e.g. generate a piecewise-linear function (or spline) in each dimension with steps taken from the PRNG, then generate uniform random points and apply the function to them. I suspect a Real Statistician would scoff, but I am not such a person.
      generate a piecewise-linear function (or spline) in each dimension with steps taken from the PRNG, then generate uniform random points and apply the function to them.

      That's sounds like a real possibility -- or rather looks like it having done a search for "spline" and seen a few images. I'm imagining sticking a few (2 or 3 or 4 decided at random) sticks, of random length, into a square of ground at randomly chosen points; and then draping a tarpaulin over them. The height of the tarpaulin at any given point then "influences" the randomly generated xy pairs such that they tend to concentrate around the sticks.

      I haven't a clue how I'd go about it though :( (Offers?:)


      With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.
        You more or less have it.
        I haven't a clue how I'd go about it though :( (Offers?:)
        For my standard consulting rate? ;-)

        Basically, you want to take a uniform distribution between 0 and M, and map it to a non-uniform one between 0 and N, so that X+I < X+J when I < J. One way to do this is to generate a step-functionpiecewise linear function, then translate values through that step function. In Matlab, it looks something like this for a 10-step function:

        fx = [0,cumsum(unifrnd(0,1,1,10))]; tmp=unifrnd(1,10,1,1e5); ix=floor(tmp); dx=rem(tmp,1); values = (fx(ix) + (fx(ix+1)-fx(ix)).*dx)./fx(end-1);
        Using a spline would be more complicated, but would be smoother.
Re^3: Randomly biased, random numbers.
by salva (Canon) on Dec 07, 2013 at 09:27 UTC
    That's hard. Mostly because I definitely do not want any formally defined distribution

    Download random pictures from the Internet and use them as the base to generate density functions.

    You may apply some simple transformations (for instance, dynamic range decompression) to obtain more disparate distributions.

      Indeed. That's pretty similar to the ideas I had -- "Eg. grab a random image, process the image with a filter to reduce it to a just points of a particular color or hue; or maybe use a Conway's Life type process to manipulate the pixels until groups of similar hues the reduce to single points; or a dozen other ideas; and then use those points as my dataset." -- triggered by roboticus' post.

      However, it turns out to be rather more difficult than I imagined.

      I thought of two ways to tackle this approach:

      1. Try to derive the points for my test data directly from the randomly chosen images.

        It fairly easy to manually pick and apply a few filters to any given image to reduce it to a bunch of discrete pixels -- converting to to grey scale, then explosion followed by embossing works well for many images; as does repeatedly applying a high filter until the number of non-black pixels is reduced to a usable number -- but finding a single sequence of filters that produce good datasets from a wide range of images is very hard.

        And even when doing this manually, it is surprising how often that once you succeeded in reducing the image to discrete pixels, they end up being pretty uniformly distributed.

      2. Use the color or luminance or hue of the images to weight the picking of 'random' pixels.

        This is also quite hard to do other than via the rejection method -- pick a random pixel and reject if the chosen attribute is above or below some cut-off value -- which can be very time consuming.

        The only other method I came up with was to construct a 'weight stick'. Eg.

        Say this represents the 2D weights map:

        +--+--+--+--+--+--+--+--+--+--+ | 0| 0| 0| 0| 0| 0| 0| 0| 0| 0| +--+--+--+--+--+--+--+--+--+--+ | 0| 5| 5| 4| 3| 3| 2| 2| 1| 0| +--+--+--+--+--+--+--+--+--+--+ | 0| 5|10| 8| 6| 5| 4| 3| 1| 0| +--+--+--+--+--+--+--+--+--+--+ | 0| 3| 6| 5| 5| 5| 5| 4| 2| 0| +--+--+--+--+--+--+--+--+--+--+ | 0| 1| 2| 3| 4| 5| 6| 6| 3| 0| +--+--+--+--+--+--+--+--+--+--+ | 0| 0| 1| 2| 3| 5| 5| 4| 3| 0| +--+--+--+--+--+--+--+--+--+--+ | 0| 0| 0| 1| 2| 4| 3| 3| 2| 0| +--+--+--+--+--+--+--+--+--+--+ | 0| 0| 0| 0| 1| 2| 1| 2| 1| 0| +--+--+--+--+--+--+--+--+--+--+ | 0| 0| 0| 0| 0| 0| 0| 0| 0| 0| +--+--+--+--+--+--+--+--+--+--+

        Then I build a 1D vector containing the (pixel coordinate pair) x its weight:

        ([0,0])x 0, ([0,1])x 0, ([0,2])x 0, ... ([0,1])x 0, ([1,1])x 5, ([2,1])x 5, ([3,1])x 4, ([4,1])x 3, ... ([0,2])x 0, ([1,2])x 5, ([2,2])x 10,([3,2])x 8, ... ...

        (I packed these into a scalar to save space.)

        Now, to pick pixels, I randomly index into the vector and get one value for every pick. The picking is fast, but the construction is relatively slow. And the higher the range of weight factors, the more memory it takes and the longer it takes to construct, but it works very well.

        Once I had this working, I was still not finding a good way to produce good weight maps from randomly chosen images. So then I decided to try and construct good weight maps randomly, but directly.

        This took a little trial and error, but I've come up with a method that seems to work quite well. It's still somewhat crude and I need to iron out some edge cases, but I've posted the code below.

      To generate the weight maps, I pick a few random points and pick a random weight for those points. Then I grade those high points out to the edges of the area in the x-axis. Then I grade those values to the strips of values created by the other points, or the edges in the y-axis.

      Drawn in grey scale, this produces weight maps like these: img img img, which I'm rather pleased with.

      Once weight-maps like these have been vectorised and then used to pick a 1000 weight-random pixels, the results look like these:img img img.

      The results are everything I could have hoped for; though the currently implementation leaves a lot to be desired - especially the slowness of the vectorisation when higher weight range is used. I'll probably have to move that process and the grading process into C to make this usable.

      If you can see improvements to either the grading process -- which currently occasionally produces really bizarre effects for reasons I haven't tracked down -- or ways of speeding up the vectorisation without dropping into C, I'd be very interested to hear them.

      The current code:


      With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.
        Use the color or luminance or hue of the images to weight the picking of 'random' pixels.

        This is also quite hard to do other than via the rejection method

        There is a much more efficient method. See here, and here.

        The trick is to build an 1D array with the accumulated weights @acu. Then, pick random numbers ($r) in the range [0, $acu[-1]) and use binary search to look for the index $ix such that $acu[$ix] <= $r <= $acu[$ix + 1].

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1065881]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others sharing their wisdom with the Monastery: (6)
As of 2024-04-26 09:38 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found