Interpolate Text Not Inside a Certain Tag

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

OK, the title may be confusing, but I don't know how to write a summary for this problem.

I'm writing a really simple Wiki-like engine. The problem is, Wiki is much like Perl's double-quote mark, but sometimes you want a single-quote:

  This is **bold**, but ``this is **not** bold``.
[download]

becomes

  This is <b>bold</b>, but this is **not** bold.
[download]

In short, I want the text between two pairs of backticks not to be processed. I can't think of any way to do this with simple regexps, so please help me. :)

By the way, I'm using something like s/\*\*(.+?)\*\*/<b>$1</b>/gs for the tags processing; anyone has a better idea?

Comment on Interpolate Text Not Inside a Certain Tag Select or Download Code

Replies are listed 'Best First'.
Re: Interpolate Text Not Inside a Certain Tag by dragonchild (Archbishop) on Apr 07, 2005 at 17:17 UTC
I can't think of any way to do this with simple regexps, . . . Then don't use a regex. Why does everyone think that all string manipulation should be handled with a regex?!? Use a simple character-by-character parser, noting state as you move through the string. `# UNTESTED !!! my $in_quote = 0; my $in_bold = 0; my $final_string = ''; foreach my $char ( split //, $string ) { if ( $char eq "'" ) { $in_quote = 1 - $in_quote; } else if ( $char eq '*' && !$in_quote ) { $final_string .= '<'; $final_string .= $in_bold ? '/' : ''; $final_string .= 'b>'; $in_bold = 1 - $in_bold; } else { $final_string .= $char; } }` [download] My wife's blog	[reply] [d/l]
Using Multiple m/\G.../gc to Tokenize by ikegami (Patriarch) on Apr 07, 2005 at 17:28 UTC
An easily extendable solution: { for ($text) { # Alias $_ to $text. /\G `` (.?) `` /gcsx && do { print($1 ); redo }; /\G \\* (.?) \\* /gcsx && do { print("<b>$1</b>"); redo }; /\G ( . # Catchall. (?: # These four lines are optional. (?!``) # They are here to speed things up (?!\\) # by avoiding calling print for .)* # single characters. ) /gcsx && do { print($1); redo }; } } [download] Handles mismatched and `` by treating them as normal characters. Update*: Removed "\n"s from prints. Added /s option to regexps, since I'm guessing newlines are not special. The 1st `.?` was ((?:(?!``).)* The 2nd `.?` was `((?:(?!\\).)` Tested: Read more... (721 Bytes)	[reply] [d/l] [select]
Re: Interpolate Text Not Inside a Certain Tag by tlm (Prior) on Apr 07, 2005 at 17:29 UTC
Regexp::Common::balanced the lowliest monk	[reply]
Re: Interpolate Text Not Inside a Certain Tag by jonadab (Parson) on Apr 07, 2005 at 17:25 UTC
First off, dragonchild's answer is well worth considering, and probably the better choice. But for the sake of interest... I think it may be possible to do this with a regex, provided the problem really is as simple as the way you have stated it and not complicated by additional nesting or somesuch. Something along these lines... `s!(?:(?:([']{2})([^']+)[']{2})\|(?:([]{2})([^']+)[]{2}))!($3 eq '')?"<b>$4</b>":$2!ge;` ...might work. (No, lots of paretheses don't bother me. Yep, I knew a lisp variant before I learned Perl.) But dragonchild's solution is easier to read and maintain. update:* fixed silly paren-counting error "In adjectives, with the addition of inflectional endings, a changeable long vowel (Qamets or Tsere) in an open, propretonic syllable will reduce to Vocal Shewa. This type of change occurs when the open, pretonic syllable of the masculine singular adjective becomes propretonic with the addition of inflectional endings." — Pratico & Van Pelt, BBHG, p68	[reply] [d/l]
Re^2: Interpolate Text Not Inside a Certain Tag by ikegami (Patriarch) on Apr 07, 2005 at 17:37 UTC
It doesn't work for the simple case "bold". I haven't tried anything else.	[reply]
Re: Interpolate Text Not Inside a Certain Tag by jonadab (Parson) on Apr 09, 2005 at 00:17 UTC
Yeah, I wasn't thinking and used the match variables as if there were only two sets of parens, rather than four. The updated version works for that simple case. However, I'd worry about unexpected data screwing it up potentially rather badly, and it only handles one type of quote mark; if you're allowed to have both single and double quote marks and nest them and escape quote marks within quotes with backslashes, stuff gets messy fast.	[reply]
Re: Interpolate Text Not Inside a Certain Tag by Anonymous Monk on Apr 07, 2005 at 17:55 UTC
Alright, thanks all! I've already thought of using "character-by-character parser" as dragonchild suggested, but I guess Perl scripts are just less sexy when you have to use techniques too common for other languages ;) . Just kidding; the truth is, as this is meant to be a really simple script (for personal use), regex seemed to be a good option: short and, well, usually simple. Of course, I'd choose char-by-char parsing over using overly-complicated regular expressions, so, anyway, thanks again.	[reply]
Re: Interpolate Text Not Inside a Certain Tag by satchm0h (Beadle) on Apr 07, 2005 at 20:39 UTC
I realize you already have a solution, but what about this: sub boldify { local $/ = undef; my $input = shift; my @parts = split /``/, $input; foreach my $i (0..scalar(@parts)) { $parts[$i] =~ s/\\(.+?)\\/<b>$1<\\b>/gs if ($i % 2 == 0); } return join '', @parts; } [download] Here's a test: Read more... (1035 Bytes)	[reply] [d/l] [select]
Re^2: Interpolate Text Not Inside a Certain Tag by Anonymous Monk on Apr 11, 2005 at 14:39 UTC
Quite interesting, but I don't think it's flexible enough. For example, if later I decide that `` enclosed by spaces ( /\s``\s/ ) shouldn't be recognized as an "escape mark", well, how can we detect it?	[reply] [d/l] [select]

Back to Seekers of Perl Wisdom