Welcome to the Monastery | |
PerlMonks |
Re: Netflix (or on handling large amounts of data efficiently in perl)by matrixmadhan (Beadle) |
on Dec 24, 2008 at 05:17 UTC ( [id://732414]=note: print w/replies, xml ) | Need Help?? |
Nice problem
I have got some suggestions regarding the data representation optimization that I think is feasible to achieve with respect to this problem movie_id, user_id and rating. From your post it seems that the above 3 values are critical and without user_id ; <movie_id> and <rating> pairs from the users cannot be unique and its a repetitive pattern For ex: <movie_id><user_id><rating> <1><U1><2> <1><U2><3> <1><U3><2> <1><U4><3> Here with the above sample data, movie_id and rating have got a repeating pattern so a map of 5 possible values for each and every movie can be used instead of storing movie_id and a rating each time. <movie_id><rating> <1><1> => a <1><2> => b <1><3> => c <1><4> => d <1><5> => e and the new combination would be only user_id and the above map ex: <U1><b1> <U2><c1> though it adds to additional lookup and retrieval the actual storage of data is compressed in terms of mapping to new values. The same logic can also be extended to secondary level of mapping to include "users with specific rating pattern" <userid><rating> <U1><1> => a1 <U1><2> => a2 and the above values can be used along with the movie id. Going for the lookup implementation a simple berkely db would be easier to go with in terms of implementation and retrieval Alternative that you might think of is appending attribute_values and storing them but its not going to do any good in terms of retrieval or storage. Please feel free to say that am wrong if am really wrong. :)
In Section
Seekers of Perl Wisdom
|
|