Beefy Boxes and Bandwidth Generously Provided by pair Networks
Keep It Simple, Stupid

comment on

( #3333=superdoc: print w/replies, xml ) Need Help??
I've been working with some C code that crunches the 2000 US Census data into CSV files, based on the specified proximity to the origin zipcode. The problem is that the C code is horribly slow, and I can't seem to figure out why. It takes my PIII/1.3Ghz/512mb RAM machine about 20 minutes to crunch the 987k input data file for zipcodes matching within a 0-25 radius of the givin origin zipcode. That seems very slow.

The master 2000 Census data file contains records in this format:

ZIP_CODE ONGITUD ATITUD 00210 71.0132 43.00589 00211 71.0132 43.00589 00212 71.0132 43.00589 00213 71.0132 43.00589 00214 71.0132 43.00589 00215 71.0132 43.00589 ...

My output file, separate for each type of range (0-25.txt for zipcodes within 0-25 miles of the origin, 0-50.txt for zipcodes within 0-50 miles of the origin, etc.), contains entries such as:

00210,00210 00210,00211 00211,00210 00210,00212 00212,00210 00210,00213 ...

For each given zipcode found in the master file (where origin == 00210 in this case, to start with), I want to output a file that contains all matching zipcodes within the specified proximity to that zipcode. So in the example above, all of the zipcodes within 0-25 miles of 00210 would be output to 0-25.txt, a csv file containing the data shown above.

I have the working radii functions which do this, and does work (but is very slow), and looks like:

#define EARTH_RADIUS 3956 static inline float deg_to_rad(float deg) { return (deg * M_PI / 180.0); } /* Function to calculate Great Circle distance between two points. */ static inline float great_circle_distance(float lat1, float long1, float lat2, float long2) { float delta_long, delta_lat, temp, distance; /* Find the deltas */ delta_lat = lat2 - lat1; delta_long = long2 - long1; /* Find the GC distance */ temp = pow(sin(delta_lat / 2.0), 2) + cos(lat1) * cos(lat2) * pow(sin(delta_long / 2.0), 2); distance = EARTH_RADIUS * 2 * atan2(sqrt(temp), sqrt(1 - temp)); return (distance); }

In perl, this would be:

my $distance = sqrt(($x1-$x2)**2+($y1-$y2)**2);

My goal is to convert this over to perl, both so I can gain the speed and efficiency of perl (as well as make this portable to Windows systems, where the current C code doesn't quite run yet), as well as expand my knowledge of perl in general.

Has anyone done this? Any pointers that might be useful here?

In reply to Zipcode Proximity script by hacker

Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":

  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.
  • Log In?

    What's my password?
    Create A New User
    and the web crawler heard nothing...

    How do I use this? | Other CB clients
    Other Users?
    Others studying the Monastery: (4)
    As of 2021-02-28 00:51 GMT
    Find Nodes?
      Voting Booth?

      No recent polls found