in reply to Re: Strange Unicode normalization question
in thread Strange Unicode normalization question

Thanks again. This code was a bit of a mess and your comments and the others have helped me see what was going wrong. I appologise for now providing better information but there was a lot of code for something which should have been quite simple. This is what the original code did:

  1. Opened data file with encoding(UTF-8)
  2. Read a line of comma separated strings from it and split them on the comma
  3. Put the split fields into a hash with keys describing the data
  4. Passed to hash to a hand written function that tried to produce a x-url-formencoded string but this function was broken and instead just stuck an '&' between each key=value so it wasn't form encoded at all
  5. Passed the resulting string into NFKD and did the substitution as I described earlier
  6. Passed the resulting string into encode to encode as UTF-8
  7. Passed the resulting string into a LWP POST

So it was horribly broken because it did not form encode properly and then NFKD was a workaround he discovered which I suspect only works because the API does normalization itself (which would not surprise me). I replaced the hand written (incorrect) form encoding with WWW::Form::UrlEncoded build_urlencoded and as you both state the NFKD is a noop as is the substitution and and it works. This was confused because it appears when it didn't work originally (without the NFKD) he was told by the API support to turn diacritics into normal characters. The actual code was a lot more complicated than this and the more I looked at it the more problems I found so I've spent most of the day rewriting it.

Thanks again for your insights.