utf8 is_valid on concat

threedaygoaty has asked for the wisdom of the Perl Monks concerning the following question:

Hi folks, we have suffered greatly at the hands of utf8 converting a very large application to ut8dom. It seems that we are concatenating UTF8 and non-UTF8 strings and ending up in trouble. So why not globally overload the concat operator with something that checks for is_valid before doing the concat? Just for debugging? Can it be done in Perl?
Cheers,
TDG

Comment on utf8 is_valid on concat

Replies are listed 'Best First'.
Re: utf8 is_valid on concat by Juerd (Abbot) on Mar 13, 2008 at 11:57 UTC
Are you perhaps doing any of the following? Using the :utf8 layer for input (open, binmode) Using _utf8_on anywhere Forgetting to decode input Forgetting to encode output Please describe in more detail what "trouble" is, and how you get there. Juerd # { site => 'juerd.nl', do_not_use => 'spamtrap', perl6_server => 'feather' }	[reply]
Re^2: utf8 is_valid on concat by threedaygoaty (Novice) on Mar 14, 2008 at 03:53 UTC
Thanks for this, AFAIK we are using utf8 functions correctly. However, it is very likely that somewhere in 200K lines of perl a simple string concat is happening between a UTF8 string and a non-UTF8 string. My idea is to try to trap this so we can find it. The only way I can think of doing is to overload the concat operator. I'm also keen to know if it is possible. We tried this but it does not work... `use strict; package UNIVERSAL; use overload "." => \&concat; sub concat { my ($a, $b) = @_; return join "XXX", $a, $b; } package Foo; my $thing = "cat" . "dog"; print "\n\n\n$thing\n\n\n";` [download] This produces catdog	[reply] [d/l]
Re^3: utf8 is_valid on concat by Juerd (Abbot) on Mar 16, 2008 at 01:29 UTC
Have a look at encoding::warnings :)	[reply]
Re: utf8 is_valid on concat by haoess (Curate) on Mar 13, 2008 at 09:14 UTC
Could you please provide a piece of code that shows your described behaviour (the "trouble"), along with your perl version? Update: And what is ut8dom? -- Frank	[reply]

Back to Seekers of Perl Wisdom