I just re-read your original post, and was reminded of this:
there are about 500 times less 1's then 0's
So for the average item, only 30 features are true, which changes things. You still can't do an all-to-all pairwise comparison in any reasonable amount of time, but you may be able to e.g. recursively partition your data by the feature closest to a 50/50 distribution, then find closest matches once you cut the comparison set down to something reasonable. If something simple like that won't work, you should look into dimensionality reduction, e.g. via random projection or PCA.
-
Are you posting in the right place? Check out Where do I post X? to know for sure.
-
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big>
<blockquote> <br /> <dd>
<dl> <dt> <em> <font>
<h1> <h2> <h3> <h4>
<h5> <h6> <hr /> <i>
<li> <nbsp> <ol> <p>
<small> <strike> <strong>
<sub> <sup> <table>
<td> <th> <tr> <tt>
<u> <ul>
-
Snippets of code should be wrapped in
<code> tags not
<pre> tags. In fact, <pre>
tags should generally be avoided. If they must
be used, extreme care should be
taken to ensure that their contents do not
have long lines (<70 chars), in order to prevent
horizontal scrolling (and possible janitor
intervention).
-
Want more info? How to link
or How to display code and escape characters
are good places to start.
|