Re: \b in Unicode regex

Two pieces of information, from perlrebackslash, to note.

"\w s a character class that matches any single word character (letters, digits, Unicode marks, and connector punctuation (like the underscore))." [my emphasis]

From the "Assertions" section:

"\b ... matches at any place between a word (something matched by \w) and a non-word character" [my emphasis again]

In your reply with actual data, you're effectively trying to match "XXXXX", which occurs in your string as "_XXXXX.". Both '_' and 'X' match "\w": "\b" does not match between '_' and 'X'.

As already demonstrated twice^[1,2], there is no Unicode issue here.

— Ken

Comment on Re: \b in Unicode regex Select or Download Code

Replies are listed 'Best First'.
Re^2: \b in Unicode regex by Arik123 (Beadle) on May 23, 2017 at 09:28 UTC
The string I tried to match (that $_) is actually found twise in $string. In the first time it's indeed preceded by _, but in the second time it's between a space and a , That you all for your time, again.	[reply]
Re^3: \b in Unicode regex by kcott (Archbishop) on May 24, 2017 at 04:51 UTC
I was certain that I checked that before posting my reply; however, I went back and doubled checked just now. `שפירא` [download] occurs only once, in the substring `ה_שפירא.mp3` [download] We can only comment on the data you show us. — Ken	[reply] [d/l] [select]
Re^4: \b in Unicode regex by Arik123 (Beadle) on May 24, 2017 at 06:11 UTC
That's not really important, now that the issue is solved. However, that substring does indeed appear twise. I don't know if your browser works like mine, but if it does, then the substring you refer to occurs in the third line of the big string I posted, and the second occurance is in the 7th and 8th lines (my browser prints a + sign at every linebreak. Maybe that's what confused you). Again, thank you all Monks for your time and help.	[reply]

In Section Seekers of Perl Wisdom