Beefy Boxes and Bandwidth Generously Provided by pair Networks
Just another Perl shrine
 
PerlMonks  

Re^3: Illegal division by zero

by AnomalousMonk (Archbishop)
on Jan 24, 2018 at 19:57 UTC ( [id://1207865]=note: print w/replies, xml ) Need Help??


in reply to Re^2: Illegal division by zero
in thread Illegal division by zero

... multiple names I am attempting to match. ... Just added if statements in the while loop.

if-statement patches are probably ok for one-off or infrequent runs with a small, stable city-name list. For larger lists of cities or more frequent runs, I think I would go with a database.

It's also possible to use a regex/hash approach:

c:\@Work\Perl\monks>perl -wMstrict -MData::Dump -le "my @cities = ('windsor riverside', ' new york ', 'philadelphia',); ;; my $rx_city = build_city_regex(@cities); print $rx_city; ;; my %city_digits; ;; RECORD: for my $record ( 'CA006139520,\"WINDSOR RIVERSIDE, ON CA \",2018-01-02,10', qq{CA006139520,\" NEW YORK , ON CA \",2018-01-02,987\n}, 'CA006139520,\"NEWYORK, ON CA \",2018-01-02,9999', 'CA006139520,\"NEW YORK, ON CA \",2018-01-02,10210', qq{CA006139520,\"PHILADELPHIA, ON CA \",2018-01-02,76\n}, ) { next RECORD unless my ($city, $digits) = $record =~ m{ ($rx_city) .* \b (\d+) \Z }xm +s; push @{ $city_digits{ canonicalize_city($city) } }, $digits } dd \%city_digits; ;; sub build_city_regex { my ($regex) = map qr{ \b (?: $_) \b }xms, join ' | ', map { (my $c = $_) =~ s{ \s+ }'\s+'xmsg; $c; } reverse sort map canonicalize_city($_), @_ ; return $regex; } ;; sub canonicalize_city { my ($city_name) = @_; ;; die qq{bad city: '$city_name'} if $city_name =~ m{ [^[:alpha:] -] }xms; $city_name =~ s{ \A \s+ | \s+ \z }''xmsg; $city_name =~ s{ \s+ }' 'xmsg; $city_name = uc $city_name; ;; return $city_name; } " (?msx-i: \b (?: WINDSOR\s+RIVERSIDE | PHILADELPHIA | NEW\s+YORK) \b ) { "NEW YORK" => [987, 10210], PHILADELPHIA => [76], "WINDSOR RIVERSIDE" => [10], }
Something like this will work even with large lists (thousands!) of city names. However, as I said, for a sufficiently high size-frequency metric, it's probably better to use a database.


Give a man a fish:  <%-{-{-{-<

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1207865]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others pondering the Monastery: (6)
As of 2024-03-28 11:09 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found