Beefy Boxes and Bandwidth Generously Provided by pair Networks
Think about Loose Coupling

Re: Zipcode Proximity script

by blokhead (Monsignor)
on Mar 31, 2003 at 16:26 UTC ( #246981=note: print w/replies, xml ) Need Help??

in reply to Zipcode Proximity script

If your search radius is so small (25 miles), You could eliminate 90% of your data before you do a lot of the number crunching using a less-accurate estimate first -- perhaps a good choice would be to check if the latitude/longitude are within ~10 degrees (or some constant appropriate for your data) of the origin and skip the coordinates if they are not. Maybe just comparing the number of degrees is a bad choice, but I also know that there are several great circle distance calculation algorithms around, with varying trade-offs between speed and accuracy. The key is to eliminate bad matches first with the low-accuracy algorithm, then use the high-accuracy one on the close ones.

You may also wish to check out Geo::Distance or Geo::PostalCode to see how they do it. Geo::PostalCode claims to have a feature exactly as you describe.


Replies are listed 'Best First'.
Re: Re: Zipcode Proximity script
by paulbort (Hermit) on Mar 31, 2003 at 19:35 UTC
    To add to this: If you don't really care about great circle distance, and you can get away with flat-earth distances, you could start by eliminating all points that are outside the smallest box that fits around your circle. This will eliminate most of the data in two passes. (Outside Lat bounds, and outside Long bounds.) If you store the master data in a SQL database with a couple of indexes, you could just do a query to get the starting set, then trim it with the circle algorighm you already have.

    The advantage this has over blokhead's method is that it will work correctly regardless of circle size, and eliminate more points, in exchange for four extra calculations (N, S, E, W bounds). (Ten degrees is a lot of land.)

    You might also want to consider storing the points in a database so that a SQL Query on a couple of indexes can do most of the data elimination for you in one pass.

    Also, if you multiply the Lat/Longs by 10,000, you can store and manipulate them as longs instead of singles. (This alone might be enough to solve your performance problem.)

    Just a quick stab at some SQL for this: (It assumes you've already calculated the bounding box.)
    SELECT zip FROM zip_lat_long WHERE lat > $south_border AND lat < $north_border AND long > $east_border AND long < $west_border AND $max_distance ** 2 > ( ABS( lat - $center_lat ) ** 2 + ABS( long - $center_long ) ** 2 );
    (With luck an optimizer will index seek the first four conditions before crunching the last one.)

    This is a neat problem, I wish I had time to bang out some sample code.
    Spring: Forces, Coiled Again!

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://246981]
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others drinking their drinks and smoking their pipes about the Monastery: (1)
As of 2021-02-27 01:08 GMT
Find Nodes?
    Voting Booth?

    No recent polls found