Beefy Boxes and Bandwidth Generously Provided by pair Networks
Don't ask to ask, just ask
 
PerlMonks  

comment on

( [id://3333]=superdoc: print w/replies, xml ) Need Help??

Hi,

I've read enough about encodings and collations and such to make my brain squishy. However, I still can't figure out how to solve my problem.

The problem is I need to compare character data in a way consistent with MySQL. I am comparing data in a table across two servers. Assume the tables have identical structures in every way, but may contain different data. I connect to each server and select col1 from each table. col1 is a varchar column with charset latin1 and collation latin1_swedish_ci (these are defaults). Now I loop through the rows returned from each table, comparing the data to determine which rows are extra or missing between the two tables.

MySQL sorts case-insensitive (hence latin1_swedish_ci) of course. I can duplicate this without problem: I just use "lc $a cmp lc $b".

This works okay until I come to a word like éclair. MySQL knows that this should be sorted just before "e". Perl thinks é is greater than "z".

I can go the other way, and make MySQL sort things the same way as Perl with "ORDER BY BINARY col1", but this defeats indexes and makes MySQL sort inefficiently, and that is highly undesirable for my purposes.

I feel sure there's a way to do this, but danged if I can figure out how. I have 8 Firefox tabs open with everything from perluniintro to Encode::Byte, and my brain is full :-) Anyone? Anyone?

Thank you!

UPDATE The idea to reach back to MySQL and ask it to compare the strings is how I'm going to try to do it. Thank you all very much for the great ideas!


In reply to locales, encodings, collations, charsets... how can I match a given MySQL collation? by xaprb

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Are you posting in the right place? Check out Where do I post X? to know for sure.
  • Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
    <code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
  • Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
  • Want more info? How to link or How to display code and escape characters are good places to start.
Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others taking refuge in the Monastery: (7)
As of 2024-03-29 13:17 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found