Beefy Boxes and Bandwidth Generously Provided by pair Networks
laziness, impatience, and hubris
 
PerlMonks  

Re: Data Salad Address Problem

by bofh_of_oz (Hermit)
on Jul 28, 2005 at 16:02 UTC ( [id://479013]=note: print w/replies, xml ) Need Help??


in reply to Data Salad Address Problem

First, determine the logical formatting rules for the data. In your case:

- Records seem to be separated by a blank line (two \n)
- Every field takes a certain number of characters on every line
- Every field can take multiple lines
- ZIP code is in the format /\d{5}-\D{4}/

You can use a multiline regexp, process each line with substr pushing elements into corresponding array(s) or appending to the strings/whatever. I'm not clear about ZIP codes - if they can be in field 4 or 5, use regex; if they are only in field 5 (and we do not see it because of HTML scrambling the text separators), then you'll be fine.

HTH

P.S. If you want, we can work on the code later...

--------------------------------
An idea is not responsible for the people who believe in it...

Replies are listed 'Best First'.
Re^2: Data Salad Address Problem
by djp (Hermit) on Jul 29, 2005 at 03:29 UTC
    Don't assume that you can derive the logical formatting rules correctly from the data supplied. Make every effort to get the supplier of the data to provide you with the rules they used to create the data.
      I agree. However, that works only in about 30% of the situations as data suppliers make every effort not to open their "private and confidential" data formats. That is, if they indeed understand them... Often, studying the data is the only way to grab the logic. Granted, one'd need a lot more data for statistical analysis than provided here, but I just outlined the idea...

      --------------------------------
      An idea is not responsible for the people who believe in it...

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://479013]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others wandering the Monastery: (4)
As of 2024-04-25 11:44 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found