Beefy Boxes and Bandwidth Generously Provided by pair Networks
Syntactic Confectionery Delight
 
PerlMonks  

comment on

( [id://3333]=superdoc: print w/replies, xml ) Need Help??

Well, it's been seven hours since the OP was last seen here, so at this point I can take a crack at it for fun without feeling like I'm doing free work.

#!usr/bin/env perl use strict; use warnings; sub is_street {return shift =~ m/^\d+/;} sub is_postal {return shift =~ m/^\w+.+\d$/;} sub street_components { my $address = shift; if ($address =~ m/^ (.+) \s+APT\s+ # APT anchor (?:\(Range\s+)? # Range syntax ([\w\d]+(?:\s+-\s+[\w\d]+)?) # Apartment number \)? # Closing range syntax $/x ) { return {street => $1, apartment => $2} } else { die "Street address match failure: <<$address>>\n"; } } sub postal_component {return shift} sub apartment_expand { my $apartment_range = shift; my ($low, $high) = split /\s*-\s*/, $apartment_range; return [$low] if !length($high); my ($low_num, $low_alpha ) = $low =~ m/^(\d+)(\w+)$/; my ($high_num, $high_alpha) = $high =~ m/^(\d+)(\w+)$/; my @return; foreach my $num ($low_num .. $high_num) { # Numeric +increment. foreach my $letter ($low_alpha .. $high_alpha) { # Alpha in +crement. push @return, "${num}${letter}"; } } return \@return; } my %record; while (my $line = <DATA>) { chomp $line; next unless length $line; $record{'addr'} = street_components($line) if is_street($line); $record{'postal'} = postal_component($line) if is_postal($line); if (exists $record{'addr'} && exists $record{'postal'}) { my $apartments = apartment_expand($record{'addr'}->{'apartment +'}); foreach my $apartment (@$apartments) { printf "%s, APT %s, %s\n" => $record{'addr'}->{'street'}, $apartment, $record{'postal'}; } undef %record; } } __DATA__ 432 10TH ST APT (Range 2A - 2B) BROOKLYN NY 10598-6601 432 10TH ST APT (Range 3A - 3B) BROOKLYN NY 10598-6601 432 10TH ST APT (Range 4A - 4B) BROOKLYN NY 10598-6605 432 10TH ST APT (Range 5A - 5D) BROOKLYN NY 10598-6605 432 10TH ST APT 6A BROOKLYN NY 10598-6605

This produces the following output:

432 10TH ST, APT 2A, BROOKLYN NY 10598-6601 432 10TH ST, APT 2B, BROOKLYN NY 10598-6601 432 10TH ST, APT 3A, BROOKLYN NY 10598-6601 432 10TH ST, APT 3B, BROOKLYN NY 10598-6601 432 10TH ST, APT 4A, BROOKLYN NY 10598-6605 432 10TH ST, APT 4B, BROOKLYN NY 10598-6605 432 10TH ST, APT 5A, BROOKLYN NY 10598-6605 432 10TH ST, APT 5B, BROOKLYN NY 10598-6605 432 10TH ST, APT 5C, BROOKLYN NY 10598-6605 432 10TH ST, APT 5D, BROOKLYN NY 10598-6605 432 10TH ST, APT 6A, BROOKLYN NY 10598-6605

It's unfortunate that the data lacks a record separator; that means you have to keep track of what state you are in. If it mattered, I'd do more detection of getting out of sync by verifying we didn't get a city before getting an address.

If one were to use this for anything more than amusement they would quickly discover how fragile the address detection is, and that would lead to a realization of how unfortunate the input data format is.


Dave


In reply to Re: Extract data from txt file by davido
in thread Extract data from txt file by bulgin24

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Are you posting in the right place? Check out Where do I post X? to know for sure.
  • Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
    <code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
  • Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
  • Want more info? How to link or How to display code and escape characters are good places to start.
Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others taking refuge in the Monastery: (5)
As of 2024-04-19 07:12 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found