http://qs321.pair.com?node_id=11119211


in reply to Modifying muliple matched strings in text

Though it seems to work, I have a hunch this isn't ideal.

Of course, the best way is to use a proper parser. In this case, the original Markdown is a Perl script itself, and the regex extracted from it looks like the following (modified slightly to put it into qr// form). Of course, now there are tons of different Markdown variations and parsers, and probably several that are more robust than this "just a regex" parser, but as long as your Markdown input is simple enough, it should probably be ok. <update> There are caveats to this approach, though - for example, the following regex will also operate on code blocks! As usual, more representative sample data will result in more accurate solutions :-) </update>

my $g_nested_brackets; $g_nested_brackets = qr{ (?> # Atomic matching [^\[\]]+ # Anything other than brackets | \[ (??{ $g_nested_brackets }) # Recursive set of nested brackets \] )* }x; my $anchors = qr{ ( # wrap whole match in $1 \[ ($g_nested_brackets) # link text = $2 \] \( # literal paren [ \t]* <?(.*?)>? # href = $3 [ \t]* ( # $4 (['"]) # quote char = $5 (.*?) # Title = $6 \5 # matching quote )? # title is optional \) ) }xs;

I've taken this regex and modified it to modernize it a bit and only capture the things we're interested in:

use warnings; use strict; my $anchors = qr{ (?(DEFINE) (?<nested_brackets> (?> [^\[\]]+ | \[ (?&nested_brackets) \] )* ) ) \[ (?<text> (?&nested_brackets) ) \] \( (?<link> [ \t]* <? .*? >? [ \t]* (?: (?<titlequote>['"]) .*? \k<titlequote> )? ) \) }xs; my $input = <<'END'; blah blah [click me](click me) more stuff blah [link here](link here) blah blah END my $expect = <<'END'; blah blah [click me](/click-me) more stuff blah [link here](/link-here) blah blah END (my $output = $input) =~ s{$anchors}{ my ($t, $l) = @+{qw/ text link /}; $l =~ s/\s+/-/g; "[$t](/$l)" }ge; use Test::More tests=>1; is $output, $expect;

Replies are listed 'Best First'.
Re^2: Modifying muliple matched strings in text (updated)
by nysus (Parson) on Jul 14, 2020 at 13:32 UTC

    Holy cow, there's some really advanced regex stuff going on here I've never seen before. I will study this closely and pick up some new tricks. Thanks.

    I thought about researching a markdown parser but I figured it was easier, for now, just to roll my own before going down a big wormhole.

    $PM = "Perl Monk's";
    $MCF = "Most Clueless Friar Abbot Bishop Pontiff Deacon Curate Priest Vicar";
    $nysus = $PM . ' ' . $MCF;
    Click here if you love Perl Monks