Beefy Boxes and Bandwidth Generously Provided by pair Networks
Clear questions and runnable code
get the best and fastest answer
 
PerlMonks  

Re: Comparing images to find similar images in a database

by wollmers (Scribe)
on Dec 05, 2014 at 19:22 UTC ( [id://1109374]=note: print w/replies, xml ) Need Help??


in reply to Comparing images to find similar images in a database

As I understand the author merlin defines a match as follows:

=13= my $FUZZ = 5; # permitted average deviation in the vector ele +ments ... =66= BUCKET: for my $bucket (@buckets) { =67= my $error = 0; =68= INDEX: for my $index (0..$#vector) { =69= $error += abs($bucket->[0][$index] - $vector[$index]); =70= next BUCKET if $error > $FUZZ * @vector; =71= } ...

IMHO the above set of matches is a subset of all matches where

$pattern_sum += @pattern_vector; $upper_bound = $pattern_sum + $FUZZ * @pattern_vector; $lower_bound = $pattern_sum + $FUZZ * @pattern_vector; BUCKET: for my $bucket (@buckets) { my $bucket_sum += @{$bucket->[0]}; next BUCKET if ($bucket_sum > $upper_bound || $bucket_sum < $lower_bound); # found, do something }

Depending on the randomness the matches will be roughly doubled, i.e. with $FUZZ=5, and a vector with 48 elements each an 8-bit integer, the number of possible different sums of the vector values is 48*255+1=12_241. The original method gives 48*5=240 as maximal allowed sum of the absolute differences. Thus the set of all possible sums is reduced by a factor of 12_241/240=51. When we use an interval of +/- 5, then 48*5*2=480, and the reduction is only 25. This means 1_000 images found out of a total of 25_000 images.

But if we calculate the sum of the vector, we can store it as an integer field in the database and use SQL comparisons.

The query result could still be refined using the original method, or something better like e.g. cosine similarity, which should be fast enough for ~1_000 vectors.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1109374]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others learning in the Monastery: (5)
As of 2024-04-26 07:57 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found