Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl: the Markov chain saw
 
PerlMonks  

comment on

( [id://3333]=superdoc: print w/replies, xml ) Need Help??
I know that title is a little deceiving, but then again, after you see what I want to be able to do, it might not be all that deceiving...

I'm having a problem on what I thought to be a small project. I have a list of individual names and companies, altogether in one long list. There is nothing signifying if the text is a person's name or a company name. I have to compare it against another list and find possible conflicts. Well, anyone that has done this can imagine the possibilities, I'll just list a few here...
search for Aetna Insurance Company should match... Aetna Insurance Company Aetna Ins. Co. Aetna Insurance Co. Aetna Ins. Company Aetna Ins Company Aetna Insurance Co Aetna Ins Co
That's only a very minute example of other particulars I've come up with...
search for Sam Jones should bring up... Sam Jones Sam J. Jones Sam J Jones S. Jones Samuel Jones
This is why I used the title fuzzy logic, is there any kind of perl module out there that can do this? Even a bunch of modules that can give me various parts of this would be good. One in particular my boss brought up is Soundex, which is here. However I'm finding that soundex is really only good for misspellings like Smith and Smyth.

I can't imagine this is an easy question to answer, and I'm sure there's not going to be one all-encompassing module to do this. I'm just hoping for some pointers and maybe some modules that could do bits and pieces of this.

Thanks in advance Monks...

In reply to Some kind of fuzzy logic. by the_0ne

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Are you posting in the right place? Check out Where do I post X? to know for sure.
  • Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
    <code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
  • Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
  • Want more info? How to link or How to display code and escape characters are good places to start.
Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others wandering the Monastery: (6)
As of 2024-04-18 20:43 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found