in reply to Data Salad Address Problem
This looks pretty horrible... Junk in, junk out.
That said, if you know that you'll always have the same format for the city and state, and always have a nine digit zip, you may have a chance. work from the last field forward. Find the ZIP with a regex, then the city and state. After that, you'll have to make the assumption that whatever is in the next field over contains the street address. Good luck!
Update: I just noticed that you do have a five digit zip in there. It won't make that much difference in the accuracy ;)
Re^2: Data Salad Address Problem
by SamCG (Hermit) on Jul 28, 2005 at 14:52 UTC
|
I agree it's horrible. . . and unfortunately, I can be sure of very little regarding the formatting. I see a number of records that put commas between city and state (which isn't really a big problem), and some which abbreviate state names with things like "MASS", and "WASH" (oh, joy).
Thanks for the good wishes. . . | [reply] [Watch: Dir/Any] |
|
You're basically going to have to quantify the different possibilities and allow for them individually. I was able to get the zip codes accurately from your sample data:
unless ( ($zip) = ($field5 =~ /(\d{5}-\d{4})/)) {
unless ( ($zip) = ($field5 =~ /(\d{5})/)) {
unless ( ($zip) = ($field4 =~ /(\d{5}-\d{4})/)
+) {
($zip) = ($field4 =~ /(\d{5})/);
+
}
}
}
but that's already pretty nasty... | [reply] [Watch: Dir/Any] [d/l] |
|
Actually, I see that the third record from the bottom has a 5-digit ZIP code, with no dash and other part... Could be that we need to make the second part optional... Yeah, oh joy...
--------------------------------
An idea is not responsible for the people who believe in it...
| [reply] [Watch: Dir/Any] |
|