I now get the 'unchecked input' part. And I can sort of understand issues with tainting. About the C code: you're talking about C code that tries to
interpret the invalid utf-8, right? Because C's basic string operations don't look at the encoding, so they are just as (un)safe when you send them a non-utf8 marked string with miscellaneous binary data in it.
update: about the (removed) line: "Another
possibility is careless use of utf8::upgrade()."
That's removed because utf8::upgrade() is always safe (if you start out with valid utf-8 flags), right?