|We don't bite newbies here... much|
But collision detection is an interesting area of research and there are a lot of algorithms out there. Much depends on your application in terms of how sparse your model is, whether these boxes represent objects moving in time, are they close to uniform size, etc. This is a basic area of computer graphics programming and squares or oblong boxes are often used as bounding boxes to test whether objects bump each other or not.
Also, millions of boxes creates memory and time constraints that may make many algorithms difficult to use. I recommend reading this paper on collision detection for large scale environments which describes methods used to overcome the exponential time requirements for pairwise comparisons.
One good technique seems to be to project the objects onto each axis so you get an interval on X and an interval on Y, and sorting separately these lists (projection is trivial if your rectangles are never rotated at an angle). Two rectangles can only intersect if their X and Y intervals intersect (though the intersection might be a point or line, not a rectangle, so you have to define whether a point touching is an intersection in your system).
It is also possible to create an "interval tree" (a range tree) that can be queried to find intersections. By sweeping a vertical line across the X axis you can look for intersections and prune. Perhaps creating such a tree is a bit overkill for you.
I haven't read enough of the paper to tell if the interval tree is in fact the same thing as a a hierarchy of bounding boxes to reduce the number of elements to sort. Also, if you have mostly similarly sized objects then the technique of dividing the space into equal volumes and checking for collisions within them may also be useful. Since you have a million objects, efficient collision detection and sorting are very important.
Also regarding sorting, you may be interested to know that there are more advanced ways to sort than just the standard sort function. Advanced sorting techniques like GRT and the Orcish maneuver, which cache keys to reduce the number of sorts needed, are discussed by Uri Guttman and Larry Rosler and available in Guttman's Sort::Maker module (along with our merlyn's Schwartzian Transform! Also from Perl 5.8.8 there is a sort pragma mentioned here that might be tweakable to improve performance even of the plain sort function. I'd like to hear how it turns out.
At least in the above type algorithms, collision detection is based actually on sorting. However it seems to me that the actual order of implementation and which method you choose depends on your data. Certainly if you have all positive integer coordinates you can tell the GRT so and it will be more efficient (it only checks the most significant bytes first). If you know the lower left coordinate is guaranteed to actually be the lower left one and not backwards, then you can eliminate huge swaths of the space by only sorting the x coordinate of that point. If you did indeed have a list sorted by area you could arbitrarily discard the lower 90% of your rectangles.. but sorting a million of anything will probably take too long, so it is best if you can make your first sweep through the data do something easy like partitioning it, or using some trick like pack() to discard for example anything with less digits, etc. For a real implementation I'd have to know more about just what you are trying to do and see some of the data.