Beefy Boxes and Bandwidth Generously Provided by pair Networks
No such thing as a small change
 
PerlMonks  

Re: Data Salad Address Problem

by socketdave (Curate)
on Jul 28, 2005 at 14:41 UTC ( [id://478991]=note: print w/replies, xml ) Need Help??


in reply to Data Salad Address Problem

This looks pretty horrible... Junk in, junk out.

That said, if you know that you'll always have the same format for the city and state, and always have a nine digit zip, you may have a chance. work from the last field forward. Find the ZIP with a regex, then the city and state. After that, you'll have to make the assumption that whatever is in the next field over contains the street address. Good luck!

Update: I just noticed that you do have a five digit zip in there. It won't make that much difference in the accuracy ;)

Replies are listed 'Best First'.
Re^2: Data Salad Address Problem
by SamCG (Hermit) on Jul 28, 2005 at 14:52 UTC
    I agree it's horrible. . . and unfortunately, I can be sure of very little regarding the formatting. I see a number of records that put commas between city and state (which isn't really a big problem), and some which abbreviate state names with things like "MASS", and "WASH" (oh, joy).

    Thanks for the good wishes. . .
      You're basically going to have to quantify the different possibilities and allow for them individually. I was able to get the zip codes accurately from your sample data:

      unless ( ($zip) = ($field5 =~ /(\d{5}-\d{4})/)) { unless ( ($zip) = ($field5 =~ /(\d{5})/)) { unless ( ($zip) = ($field4 =~ /(\d{5}-\d{4})/) +) { ($zip) = ($field4 =~ /(\d{5})/); + } } }


      but that's already pretty nasty...
        Actually, I see that the third record from the bottom has a 5-digit ZIP code, with no dash and other part... Could be that we need to make the second part optional... Yeah, oh joy...

        --------------------------------
        An idea is not responsible for the people who believe in it...

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://478991]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others musing on the Monastery: (3)
As of 2024-03-29 06:27 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found