http://qs321.pair.com?node_id=196268

IOrdy has asked for the wisdom of the Perl Monks concerning the following question:

Rather than explain the following snippet should give you an idea of what I'm trying to do. I just can't quite get it going and respectfully ask for the help of the monks.

a) Is this right? I'm just trying to save writing the almost the same regex for every tag.
b) I can't figure out how to get the matched variables ($1, $2 etc..) into the text and I'm fresh out of ideas.

Code:
# ... my %taglist_complex = ( 'email' => "<a href='mailto:$1'>$2</a>", 'url' => "<a href='$1'>$2</a>", 'img' => "<img src='$1' alt='$2' />" ); my %taglist_simple = ( 'b' => '<b>$1</b>', 'i' => '<i>$1</i>', 'u' => '<u>$1</u>', 'url' => '<a href=\'$1\'>$1</a>', 'img' => '<img src=\'$1\' alt=\'$1\' />' ); # Process Complex Tags -> [url="http://www.foo.bar.com"]foobar[/ur +l] while (my ($tag, $regex) = each %taglist_complex) { $post =~ s/\[\s*$tag\s*=\s*['"]?([\w\W][^\]]*?)['"]?\](.[^\[|\ +]]*?)\[\s*\/\s*$tag\s*\]/$regex/gi; } # Process Simple Tags -> [url]http://www.foo.bar.com[/url] while (my ($tag, $regex) = each %taglist_simple) { $post =~ s/\[\s*$tag\s*\](.[^\[|\]]*?)\[\s*\/\s*$tag\s*\]/$reg +ex/gi; } #...

Replies are listed 'Best First'.
Re: Drop in regex replacements?
by Ovid (Cardinal) on Sep 09, 2002 at 15:02 UTC

    One relatively simple way to accomplish this would be to use subrefs with the 'e' switch (though I'd still use a parser instead):

    #!/usr/bin/perl -w use strict; my $foo = sub { "$1 years ago" }; my $bar = 'Four score and perhaps seven'; $bar =~ s/perhaps (seven)/&$foo/e; print $bar;

    Cheers,
    Ovid

    Join the Perlmonks Setiathome Group or just click on the the link and check out our stats.

      Thanks Ovid that makes more sense.

      My origional code was in php and that did support adding the replace part of the regex as a string from somewhere else (i.e. a hash). I was thinking if preg_replace() can do it why can't a real perl replace regex do it :-)
Re: Drop in regex replacements?
by Joost (Canon) on Sep 09, 2002 at 14:50 UTC
    Rather than explain the following snippet should give you an idea of what I'm trying to do. I just can't quite get it going and respectfully ask for the help of the monks.
    Rather than explain it, here is my incomplete and untested (for lack of test data) solution:

    sub replace { my ($tag,$attr,$content) = @_; $attr ||= $content; if ($tag eq 'b') { return "<b>$content</b>"; } elsif ($tag eq 'url') { return "<a href='$attr'>$content</a>"; } # etc. etc. etc. } s/\[(\w+)=?(\w*)\](\w+)\[\1\]/replace($1,$2,$3)/esg;
    Hope this helps :-)

    Update:
    Ok, let's explain it a little anyway: I've turned your problem inside out, so you only need 1 regex and only 1 s statement. This has 3 advantages:

    • You don't have to run an additional s statement on the whole text for each new tag you decide to add.
    • You only need to update and debug 1 regular expression if you change the overall syntax.
    • You get rid of the $1 - problem.
    You might need a little finetuning on this code (I haven't tested it except for syntax) but I think this is a lot easier and maintainable way of solving your problem.
    -- Joost downtime n. The period during which a system is error-free and immune from user input.
      Sorry, I thought it would be quite obvious because above each regex is an example of what I am parsing. :-(
      # Process Complex Tags -> [url="http://www.foo.bar.com"]foobar[/url]
      What you suggested would be longer than just cutting/pasting and running a complete regex for every tag. shame.
        Please read my above explanation for what I am doing here.

        I am changing the problem around to make it easier to maintain and expand. If you really need to have compact code, you can change the &replace sub like this:

        my %replaces = ( email => sub { "<a href=mailto:'$_[0]'>$_[1]</a>" }, url => sub { "<a href='$_[0]'>$_[1]</a>'" }, # etc etc etc ); sub replace { my ($tag,$attr,$content) = @_; $attr ||= $content; return $replaces{$tag}->($attr,$content); }
        I was merely trying to make the code as clean and clear as I could.
        -- Joost downtime n. The period during which a system is error-free and immune from user input.
Re: Drop in regex replacements?
by zigdon (Deacon) on Sep 09, 2002 at 14:31 UTC
    I think I understand what you're trying to do here. While I don't have the brain cycles to process the exact regexp you have there, could it be that the problem you're seeing is just because you're using double quotes (") when you should be using a single quote ('), in the hash definition?
    my %taglist_complex = ( 'email' => "<a href='mailto:$1'>$2</a>", 'url' => "<a href='$1'>$2</a>", 'img' => "<img src='$1' alt='$2' />" );
    $1 and $2 are getting extrapolated in this definitions, while I think you mean to have them there as literals.

    -- Dan

      The problem with double quotes being that they will try and interpolate the $1 right there and then (well thats what I thought it was trying to do). Regardless I did try that and it's not the problem.

      I should also note that the regex's actually do work fine (though they may not be 100% correct as I'm new to the sport) it's just I get a literal '<b>$1</b>' as the replacement.
Re: Drop in regex replacements?
by Util (Priest) on Sep 09, 2002 at 19:43 UTC

    If I understand your input data, then the code below contains two working solutions. The 'tagsub' solution is very similar to Joost's concept. Even on an old 200MHz box, either solution seems quite fast; perhaps I misunderstood your issue with Joost's solution:

    • 5500 tags/sec - taglist
    • 4800 tags/sec - tagsub

    #!/usr/bin/perl -w use strict; my $data = <<'EOF'; [url="http://www.foo.bar.com"]foobar[/url] [url]http://www.foo.bar.com[/url] [img]http://example.com/prettylady.gif[/img] [img="http://example.com/prettylady.gif"]Pretty Lady[/img] [email="bruce.gray@acm.org"]Util[/email] [b]bold[/b] [i]ital[/i] [u]under[/u] [em]emph[/em] [strong]smelly[/strong] EOF my %taglist = ( '' => { # Simple b => sub { qq{<b>$_[1]</b>} }, i => sub { qq{<i>$_[1]</i>} }, u => sub { qq{<u>$_[1]</u>} }, em => sub { qq{<em>$_[1]</em>} }, strong => sub { qq{<strong>$_[1]</strong>} }, url => sub { qq{<a href="$_[1]">$_[1]</a>} }, img => sub { qq{<img src="$_[1]" alt="$_[1]" />} }, }, '=' => { # Complex email => sub { qq{<a href="mailto:$_[0]">$_[1]</a>} }, url => sub { qq{<a href="$_[0]">$_[1]</a>} }, img => sub { qq{<img src="$_[0]" alt="$_[1]" />} }, }, ); sub tagsub { my $tag = shift; my $equals_sign = shift; if ( not $equals_sign ) { # Simple return qq{<b>$_[1]</b>} if $tag eq 'b'; return qq{<i>$_[1]</i>} if $tag eq 'i'; return qq{<u>$_[1]</u>} if $tag eq 'u'; return qq{<em>$_[1]</em>} if $tag eq 'em'; return qq{<strong>$_[1]</strong>} if $tag eq 'strong'; return qq{<a href="$_[1]">$_[1]</a>} if $tag eq 'url'; return qq{<img src="$_[1]" alt="$_[1]" />} if $tag eq 'img'; } else { # Complex return qq{<a href="mailto:$_[0]">$_[1]</a>} if $tag eq 'email'; return qq{<a href="$_[0]">$_[1]</a>} if $tag eq 'url'; return qq{<img src="$_[0]" alt="$_[1]" />} if $tag eq 'img'; } } my $pat = qr{\[(\w+)(=?)['"]?([^\]]*?)['"]?\](.+?)\[/\1\]}; $_ = $data; s/$pat/$taglist{$2}{$1}->($3,$4)/esg; print "Run # 1:\n$_\n"; $_ = $data; s/$pat/tagsub($1,$2,$3,$4)/esg; print "Run # 2:\n$_\n"; # Boost size of $data to 10240 lines. #$data .= $data for 1 .. 10; # #use Benchmark qw(cmpthese); #cmpthese( # -300, # { # hash_sub => sub{ # $_ = $data; # s/$pat/$taglist{$2}{$1}->($3,$4)/esg; # }, # one_sub => sub{ # $_ = $data; # s/$pat/tagsub($1,$2,$3,$4)/esg; # }, # } #);