Come for the quick hacks, stay for the epiphanies. | |
PerlMonks |
comment on |
( [id://3333]=superdoc: print w/replies, xml ) | Need Help?? |
> \b{wb} doesn't seem to take initial ones as part of words
good catch! > my conclusion is that the only way to handle the OP problem in a way fully consistent with \w{wb} semantics is to just split using it, and maybe repack non word fragments afterwards My intuition says split on non-words like whitespace, reject "words" without \w or equivalent characters and repack the rest afterwards. I doubt it's possible to cover all desirable edge cases by \b{wb} this will depend on the user's perspective, especially when considering multi-language environments and unicode.
Cheers Rolf In reply to Re^6: Splitting multiline string into words, the stuff between words, and newlines
by LanX
|
|