comment on

I've been working with some C code that crunches the 2000 US Census data into CSV files, based on the specified proximity to the origin zipcode. The problem is that the C code is horribly slow, and I can't seem to figure out why. It takes my PIII/1.3Ghz/512mb RAM machine about 20 minutes to crunch the 987k input data file for zipcodes matching within a 0-25 radius of the givin origin zipcode. That seems very slow.

The master 2000 Census data file contains records in this format:

   ZIP_CODE ONGITUD ATITUD
   00210 71.0132 43.00589
   00211 71.0132 43.00589
   00212 71.0132 43.00589
   00213 71.0132 43.00589
   00214 71.0132 43.00589
   00215 71.0132 43.00589
   ...
[download]

My output file, separate for each type of range (0-25.txt for zipcodes within 0-25 miles of the origin, 0-50.txt for zipcodes within 0-50 miles of the origin, etc.), contains entries such as:

   00210,00210
   00210,00211
   00211,00210
   00210,00212
   00212,00210
   00210,00213
   ...
[download]

For each given zipcode found in the master file (where origin == 00210 in this case, to start with), I want to output a file that contains all matching zipcodes within the specified proximity to that zipcode. So in the example above, all of the zipcodes within 0-25 miles of 00210 would be output to 0-25.txt, a csv file containing the data shown above.

I have the working radii functions which do this, and does work (but is very slow), and looks like:

   #define EARTH_RADIUS 3956

   static inline float deg_to_rad(float deg) {
        return (deg * M_PI / 180.0);
   }
   
   /* Function to calculate Great Circle distance 
      between two points. */
   static inline float great_circle_distance(float lat1,
                                             float long1,
                                             float lat2,
                                             float long2) {
           float delta_long, delta_lat, temp, distance;

           /* Find the deltas */
           delta_lat = lat2 - lat1;
           delta_long = long2 - long1;

           /* Find the GC distance */
           temp = pow(sin(delta_lat / 2.0),
                      2) + cos(lat1) * cos(lat2)
                      * pow(sin(delta_long / 2.0), 2);

           distance = EARTH_RADIUS * 2 * atan2(sqrt(temp),
                      sqrt(1 - temp));

           return (distance);
   }
[download]

In perl, this would be:

   my $distance = sqrt(($x1-$x2)**2+($y1-$y2)**2);
[download]

My goal is to convert this over to perl, both so I can gain the speed and efficiency of perl (as well as make this portable to Windows systems, where the current C code doesn't quite run yet), as well as expand my knowledge of perl in general.

Has anyone done this? Any pointers that might be useful here?

In reply to Zipcode Proximity script by hacker

Are you posting in the right place? Check out Where do I post X? to know for sure.
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
Want more info? How to link or How to display code and escape characters are good places to start.


go ahead... be a heretic
	PerlMonks