http://qs321.pair.com?node_id=1190946


in reply to Re^2: \b in Unicode regex
in thread \b in Unicode regex

Thanks a lot, Monks.

Knowing that there's no issue wuth \b, I kept investigating. Turned out that one of the strings wasn't really utf8 (for some reason, my terminal insisted on printing it as utf8, though). utf8::decode solved the problem.

Replies are listed 'Best First'.
Re^4: \b in Unicode regex
by ikegami (Patriarch) on May 23, 2017 at 14:08 UTC

    You actually had the opposite problem: You had UTF-8, but the regex engine expects a string of Unicode Code Points[1]. utf8::decode provides the latter from the former.


    1. More specifically, it's \w, \b, \d, etc that are defined in terms of UCP.