## I set $_ to your sample string cut-n-pasted, then ran it through decode DB<33> $_ = Encode::decode( q{UTF-8}, $_ ) ## Afterwards this worked (U+2013 is EN DASH); if you're not interested in what ## the separator was you can of course change that bit to non-capturing DB<38> x m{ ^ (\d+) \. \s+ (.*?) \s+(-|\N{EN DASH}|\N{EM DASH})\s+ (.*?) $}x 0 123 1 'The Quick brown fox' 2 '\x{2013}' 3 'jumped over'