Re^4: Randomly biased, random numbers. (A working solution)

Indeed. That's pretty similar to the ideas I had -- "Eg. grab a random image, process the image with a filter to reduce it to a just points of a particular color or hue; or maybe use a Conway's Life type process to manipulate the pixels until groups of similar hues the reduce to single points; or a dozen other ideas; and then use those points as my dataset." -- triggered by roboticus' post.

However, it turns out to be rather more difficult than I imagined.

I thought of two ways to tackle this approach:

Try to derive the points for my test data directly from the randomly chosen images.
It fairly easy to manually pick and apply a few filters to any given image to reduce it to a bunch of discrete pixels -- converting to to grey scale, then explosion followed by embossing works well for many images; as does repeatedly applying a high filter until the number of non-black pixels is reduced to a usable number -- but finding a single sequence of filters that produce good datasets from a wide range of images is very hard.
And even when doing this manually, it is surprising how often that once you succeeded in reducing the image to discrete pixels, they end up being pretty uniformly distributed.
Use the color or luminance or hue of the images to weight the picking of 'random' pixels.
This is also quite hard to do other than via the rejection method -- pick a random pixel and reject if the chosen attribute is above or below some cut-off value -- which can be very time consuming.
The only other method I came up with was to construct a 'weight stick'. Eg.
Say this represents the 2D weights map:
```
+--+--+--+--+--+--+--+--+--+--+
| 0| 0| 0| 0| 0| 0| 0| 0| 0| 0|
+--+--+--+--+--+--+--+--+--+--+
| 0| 5| 5| 4| 3| 3| 2| 2| 1| 0|
+--+--+--+--+--+--+--+--+--+--+
| 0| 5|10| 8| 6| 5| 4| 3| 1| 0|
+--+--+--+--+--+--+--+--+--+--+
| 0| 3| 6| 5| 5| 5| 5| 4| 2| 0|
+--+--+--+--+--+--+--+--+--+--+
| 0| 1| 2| 3| 4| 5| 6| 6| 3| 0|
+--+--+--+--+--+--+--+--+--+--+
| 0| 0| 1| 2| 3| 5| 5| 4| 3| 0|
+--+--+--+--+--+--+--+--+--+--+
| 0| 0| 0| 1| 2| 4| 3| 3| 2| 0|
+--+--+--+--+--+--+--+--+--+--+
| 0| 0| 0| 0| 1| 2| 1| 2| 1| 0|
+--+--+--+--+--+--+--+--+--+--+
| 0| 0| 0| 0| 0| 0| 0| 0| 0| 0|
+--+--+--+--+--+--+--+--+--+--+
[download]
```
Then I build a 1D vector containing the (pixel coordinate pair) x its weight:
```
([0,0])x 0, ([0,1])x 0, ([0,2])x 0, ...
([0,1])x 0, ([1,1])x 5, ([2,1])x 5, ([3,1])x 4, ([4,1])x 3, ...
([0,2])x 0, ([1,2])x 5, ([2,2])x 10,([3,2])x 8, ...
...
[download]
```
(I packed these into a scalar to save space.)
Now, to pick pixels, I randomly index into the vector and get one value for every pick. The picking is fast, but the construction is relatively slow. And the higher the range of weight factors, the more memory it takes and the longer it takes to construct, but it works very well.
Once I had this working, I was still not finding a good way to produce good weight maps from randomly chosen images. So then I decided to try and construct good weight maps randomly, but directly.
This took a little trial and error, but I've come up with a method that seems to work quite well. It's still somewhat crude and I need to iron out some edge cases, but I've posted the code below.

To generate the weight maps, I pick a few random points and pick a random weight for those points. Then I grade those high points out to the edges of the area in the x-axis. Then I grade those values to the strips of values created by the other points, or the edges in the y-axis.

Drawn in grey scale, this produces weight maps like these: img img img, which I'm rather pleased with.

Once weight-maps like these have been vectorised and then used to pick a 1000 weight-random pixels, the results look like these:img img img.

The results are everything I could have hoped for; though the currently implementation leaves a lot to be desired - especially the slowness of the vectorisation when higher weight range is used. I'll probably have to move that process and the grading process into C to make this usable.

If you can see improvements to either the grading process -- which currently occasionally produces really bizarre effects for reasons I haven't tracked down -- or ways of speeding up the vectorisation without dropping into C, I'd be very interested to hear them.

The current code:

With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'

Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.

"Science is about questioning the status quo. Questioning authority".

In the absence of evidence, opinion is indistinguishable from prejudice.

Comment on Re^4: Randomly biased, random numbers. (A working solution) Select or Download Code

Replies are listed 'Best First'.
Re^5: Randomly biased, random numbers. (A working solution) by salva (Canon) on Dec 10, 2013 at 13:58 UTC
Use the color or luminance or hue of the images to weight the picking of 'random' pixels. This is also quite hard to do other than via the rejection method There is a much more efficient method. See here, and here. The trick is to build an 1D array with the accumulated weights `@acu`. Then, pick random numbers (`$r`) in the range `[0, $acu[-1])` and use binary search to look for the index `$ix` such that `$acu[$ix] <= $r <= $acu[$ix + 1]`.	[reply] [d/l] [select]
Re^6: Randomly biased, random numbers. (A working solution) by BrowserUk (Patriarch) on Dec 10, 2013 at 14:32 UTC
That's nice. I'll have to try it on a a random selection of images, but it is definitely interesting. (I'm a bit confused why you are writing back the current pixel with the same color you just read, and then setting the color of the adjacent pixel one row down to the newly calculated color? Which means you'll be re-reading your new values when processing the next row. And why `x => $w + $i</x> rather than <c>y => $j + 1`?) With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday' Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice.	[reply] [d/l]
Re^7: Randomly biased, random numbers. (A working solution) by salva (Canon) on Dec 10, 2013 at 14:37 UTC
The output file contains three images side by side: the original image at the left the weights image in the middle and the image with the random points at the right	[reply]
Re^8: Randomly biased, random numbers. (A working solution) by BrowserUk (Patriarch) on Dec 10, 2013 at 14:55 UTC


Problems? Is your data what you think it is?
	PerlMonks