Above represent good advice, but it may be profitable (efficient) to normalize the capitalization before dealing with the more complex fuzzy matching needs found in item 3.
I would be seriously inclined to see if lc'ing everything, and then uc'ing first letter of each word minimizes the work.
However, this scheme is suggested on the basis of one snippet of your data; if you have to distinguish between Mr. MacHinery and (something) Machinery *OR* if capitalization on the output need be not only consistent but also "correct" -- for unknown values of correct -- you will need something far better than this simple-minded scheme.
-
Are you posting in the right place? Check out Where do I post X? to know for sure.
-
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big>
<blockquote> <br /> <dd>
<dl> <dt> <em> <font>
<h1> <h2> <h3> <h4>
<h5> <h6> <hr /> <i>
<li> <nbsp> <ol> <p>
<small> <strike> <strong>
<sub> <sup> <table>
<td> <th> <tr> <tt>
<u> <ul>
-
Snippets of code should be wrapped in
<code> tags not
<pre> tags. In fact, <pre>
tags should generally be avoided. If they must
be used, extreme care should be
taken to ensure that their contents do not
have long lines (<70 chars), in order to prevent
horizontal scrolling (and possible janitor
intervention).
-
Want more info? How to link
or How to display code and escape characters
are good places to start.
|