Defining Characters in Word Boundary?

iaw4 has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Defining Characters in Word Boundary? by ikegami (Patriarch) on Jan 19, 2011 at 22:47 UTC
/`\b`/ is equivalent to /`(?<=\w)(?!\w)\|(?<!\w)(?=\w)`/. Feel free to replace `\w` with a character class. It would be tedious to have to write `\\$keyword([^a-zA-Z])` and then have to substitute back $1 (because I do not want it eaten). Don't eat it if you don't want add it back. Equivalent without eating: `\\$keyword(?=[^a-zA-Z])` [download] But you surely meant `\\$keyword(?![a-zA-Z])` [download] In general, it's easier to extract the keyword, then check if it's the one you want. `\\([a-zA-Z]+)` [download]	[reply] [d/l] [select]
Re^2: Defining Characters in Word Boundary? by Jim (Curate) on Jan 20, 2011 at 01:30 UTC
In general, it's easier to extract the keyword, then check if it's the one you want. I agree wholeheartedly. Since the LaTeX name constraint is exact and well-understood (the characters 'a' through 'z' and the characters 'A' through 'Z'), you simply need to match just those characters. Explicitly matching the right-hand boundary isn't necessary.	[reply]
Re^2: Defining Characters in Word Boundary? by iaw4 (Monk) on Jan 20, 2011 at 14:00 UTC
thanks. this is what I needed to learn. I did not know the extended regex expressions in the camel book (i.e., (?...) sequences), chapter 5, table 5.6. is there a meaningful difference between (?!a-z) and (?=^a-z)? is the former recommended? /iaw	[reply]
Re^3: Defining Characters in Word Boundary? by ikegami (Patriarch) on Jan 20, 2011 at 16:32 UTC
Compare `'ab' =~ /a(?!a)/ 'a' =~ /a(?!a)/` [download] and `'ab' =~ /a(?=[^a])/ 'a' =~ /a(?=[^a])/` [download]	[reply] [d/l] [select]
Re^3: Defining Characters in Word Boundary? by Jim (Curate) on Jan 20, 2011 at 17:17 UTC
is there a meaningful difference between `(?![a-z])` and `(?=[^a-z])`? is the former recommended? Yes, they're different regular expression patterns that match different things. `(?![a-z])` asserts "not followed by any of the characters from 'a' through 'z', which includes not being followed by any character." `(?=[^a-z])` asserts "followed by a single character that is not any of the characters from 'a' through 'z'." The former is a negative assertion; the latter is a positive assertion. In your case, `(?![a-z])` is what you would want to use. [PerlMonks posting tip: Enclose Perl code in `<code></code>` tags, even code within paragraphs.] UPDATE: Removed color.	[reply] [d/l] [select]
Re^4: Defining Characters in Word Boundary? by AnomalousMonk (Archbishop) on Jan 20, 2011 at 23:28 UTC
Re^5: Defining Characters in Word Boundary? by Jim (Curate) on Jan 21, 2011 at 01:06 UTC
Some notes below your chosen depth have not been shown here
Re^4: Defining Characters in Word Boundary? by Anonymous Monk on Jan 20, 2011 at 18:08 UTC
A reply falls below the community's threshold of quality. You may see it by logging in.
Re: Defining Characters in Word Boundary? by Jim (Curate) on Jan 20, 2011 at 01:16 UTC
is it possible to define the characters that '\b' matches? I believe so, though I've never tried to do it myself. See Creating Custom RE Engines in perlre.	[reply]
Re: Defining Characters in Word Boundary? by luis.roca (Deacon) on Jan 19, 2011 at 22:59 UTC
is it possible to define the characters that '\b' matches? I am processing latex code, and their macro character space is \ a-zA-Z \. I would like to write \\$keyword\b Unless I'm understanding your intentions wrong, that's the purpose of `\b`. Example: `m/\bChi_2\b/` I don't think the underscore will cause you problems within the defined `\b \b` but I'm sure I'll be corrected shortly if I'm wrong. :) UPDATE: 1.20.2011 1:30PM Through help in the chatterbox and "Mastering Regular Expressions" pg. 89, I learned that `\w` has included `_` since Perl 2. So `/\bChi_2\b/` will not match. "...the adversities born of well-placed thoughts should be considered mercies rather than misfortunes." � Don Quixote	[reply] [d/l] [select]
Re^2: Defining Characters in Word Boundary? by ikegami (Patriarch) on Jan 19, 2011 at 23:17 UTC
I believe he's saying that "`_2`" isn't part of the macro, so `$keyword = 'Chi'; '...\\Chi_2...' =~ /\\$keyword\b/` should match.	[reply] [d/l] [select]
Re^3: Defining Characters in Word Boundary? by luis.roca (Deacon) on Jan 19, 2011 at 23:39 UTC
Ugh! � Apologies to the OP. I misread that. "...the adversities born of well-placed thoughts should be considered mercies rather than misfortunes." � Don Quixote	[reply]
Re^2: Defining Characters in Word Boundary? by ikegami (Patriarch) on Jan 21, 2011 at 01:42 UTC
I'm not sure why you added the update. I don't see what it adds, and it's not true. `/\bChi_2\b/` will match plenty of strings. `'Chi_2' =~ /\bChi_2\b/ # Match '!Chi_2!' =~ /\bChi_2\b/ # Match` [download] Maybe you had a specific string in mind, but I don't see how this relates to the OP. He would not use `Chi_2` in the regex pattern. In a world where an identifier matches `/^\w+\z/`, you might do something like `($_ = '\\Chi+3' ) =~ s/\\Ch\b/$ch/g; # Won't replace ($_ = '\\Chi+3' ) =~ s/\\Chi\b/$chi/g; # Will replace ($_ = '\\Chi_2+3') =~ s/\\Chi\b/$chi/g; # Won't replace` [download] But what if identifiers match `/^[a-zA-Z]\z/`? You'd want the following behaviour: `($_ = '\\Chi+3' ) =~ s/\\Ch???/$ch/g; # Won't replace ($_ = '\\Chi+3' ) =~ s/\\Chi???/$chi/g; # Will replace ($_ = '\\Chi_2+3') =~ s/\\Chi???/$chi/g; # Will replace` [download] That's the OP's question. As I've already mentioned, I recommend extracting the identifier, then checking if it's one of interest. This can be as simple as the following: `/\\([a-zA-Z]+)/ exists($vars{$1}) ? $vars{$1} : "\\$1" /eg` [download] The technique scales well, and it avoids the problem of matching something you've previously replaced.	[reply] [d/l] [select]
Re^3: Defining Characters in Word Boundary? by luis.roca (Deacon) on Jan 21, 2011 at 04:45 UTC
"Maybe you had a specific string in mind, but I don't see how this relates to the OP. He would not use Chi_2 in the regex pattern. I did have a very similar string in mind. In a world where an identifier matches /^\w+\z/, you might do something like" `($_ = '\\Chi_2+3') =~ s/\\Chi\b/$chi/g; # Won't replace` I understand my update isn't contributing to the OP's original question. I'm not trying to distract from his post or the thread, simply attempting to correct what I said regarding the underscore having no effect on the RegEx's success (again the one I had in mind). In my original reply I was referring to matching 'Chi' within 'Chi_2' using \b. I previously said that I didn't think the underscore would be a problem. However after some help in the CB from erix and Tanktalus it was shown that an underscore would interfere with this particular match: `say (("Chi_2" =~ /\bChi\b/) ? "match" : "no match");` returns: "no match" * Thanks again to Tantalus for this control structure. Again, apologies for any confusion caused. "...the adversities born of well-placed thoughts should be considered mercies rather than misfortunes." � Don Quixote	[reply] [d/l] [select]
Re^4: Defining Characters in Word Boundary? by ikegami (Patriarch) on Jan 21, 2011 at 07:12 UTC


Syntactic Confectionery Delight
	PerlMonks