http://qs321.pair.com?node_id=11117340


in reply to matching characters eg å

Convert from non-native representations at the boundaries; as close to ingress and egress as possible. Here's an example:

#!/usr/bin/env perl BEGIN { binmode STDOUT, ':encoding(utf-8)'; } use strict; use warnings; use utf8; use HTML::Entities; my $input = "Håppy"; print "Input was: $input\n"; my $native = decode_entities($input); print "Decoded to native format: $native\n"; if ($native =~ m/å/) { print "Found an å.\n"; }

The output is:

Input was: Håppy Decoded to native format: Håppy Found an å.

By converting from an HTML entity at the point closest to ingress you don't have to worry anymore about dealing with a layer that adds complexity to manipulating the string.

Think in terms like this:

  1. Accept input
  2. Transform input to a native format.
  3. Work with the native format string.
  4. Transform from native to external format.
  5. Send output to external resource.

If you always keep transformations as close to the edge as possible your code will be a lot simpler.


Dave