Beefy Boxes and Bandwidth Generously Provided by pair Networks
Just another Perl shrine
 
PerlMonks  

Re: regex negative lookahead behaviour

by dws (Chancellor)
on Jul 18, 2003 at 19:34 UTC ( [id://275721]=note: print w/replies, xml ) Need Help??


in reply to regex negative lookahead behaviour

I'm parsing snail mail addresses, particularly for horrible beasts like "456 4 1/2 MILE RD".

Meta advice:

I've been involved with two projects over the years that have had to parse names and addresses. In both cases, what we ended up with was a system that got ~98% right automatically, then kicked out the remaining 2% for vetting by a human.

It's a diminishing returns problem: If you model the economics, at some point you have to stop pouring your effort into matching the remaining pool of difficult addresses, and let a human being do it.

Replies are listed 'Best First'.
Re: Re: regex negative lookahead behaviour
by BazB (Priest) on Jul 20, 2003 at 11:19 UTC

    Unfortunately, dws's approach isn't always possible.

    The system I work on handles significant volumes of addresses, using dedicated (commercial) software to handle the identification and validation of that information.
    There is inhouse processing to help smooth things out, but not every problem can be catered for.

    The volumes involved prevent it being practial for humans to process the problem records (and in fact some of those problem records are a direct result of human input).

    Even if the volumes were low enough for humans to be able to process exceptions, humans can't get it right all of the time.
    This might be because of a lack of information, poorly laid of information or just human error.

    As shemp describes, even reference data used in such validation systems isn't perfect, and this sets the upper limit to what you can reasonably expect to acheieve.

    I'd say that you can never expect to get things 100% right, and it might end up being cheaper and/or easier to accept a certain error rate.
    Of course, your client may not accept this, but that's a whole other problem :-)

    Cheers.

    BazB


    If the information in this post is inaccurate, or just plain wrong, don't just downvote - please post explaining what's wrong.
    That way everyone learns.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://275721]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others drinking their drinks and smoking their pipes about the Monastery: (7)
As of 2024-04-19 08:28 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found