http://qs321.pair.com?node_id=11119206

nysus has asked for the wisdom of the Perl Monks concerning the following question:

In the following piece of markdown text, I want to replace the spaces with a hyphen and also make the link absolute with a slash in front. So I want to go from:

blah blah [click me](click me) more stuff blah [link here](link here) blah blah ...

to:

blah blah [click me](/click-me) more stuff blah [link here](/link-here) blah blah ...

So this is as far as I got and I'm stymied:

#!/usr/bin/env perl use strict; use warnings; my $text = "blah blah [click me](click me) more stuff\nblah [link here +](link here) blah blah"; $text =~ s/(\[[^]]+]\()/$1\//g; # make links absolute $text =~ /(\[[^]]+]\(\/)([^\)]+)/; my $part1 = $1; my $part2 = $2; my $orig_part2 = $part2; $part2 =~ s/ /-/g; $text =~ s/\Q$part1$orig_part2\E/$part1$part2/g;

The obvious problem with this code is that only the first link gets the space replaced:

blah blah [click me](/click-me) more stuff blah [link here](/link here) blah blah

I'm drawing a blank on how I might loop through all matches to links and modify those matches. Or maybe there is an entirely better way of pulling this off. Thanks!

UPDATE:

OK, I noodled around with this some more. On a lark, I tried a "global" match (which I didn't know existed outside substitution), and came up with this:

my $text = "blah blah [click me](click me) more stuff\nblah [link here +](link here) blah blah"; $text =~ s/(\[[^]]+]\()/$1\//g; # make links absolute my @matches = $text =~ /(\[[^]]+]\(\/)([^\)]+)/g; while (@matches) { my $part1 = shift @matches; my $part2 = shift @matches; my $orig_part2 = $part2; $part2 =~ s/ /-/g; $text =~ s/\Q$part1$orig_part2\E/$part1$part2/g; }

Though it seems to work, I have a hunch this isn't ideal.

$PM = "Perl Monk's";
$MCF = "Most Clueless Friar Abbot Bishop Pontiff Deacon Curate Priest Vicar";
$nysus = $PM . ' ' . $MCF;
Click here if you love Perl Monks

Replies are listed 'Best First'.
Re: Modifying muliple matched strings in text (updated)
by haukex (Bishop) on Jul 12, 2020 at 13:36 UTC
    Though it seems to work, I have a hunch this isn't ideal.

    Of course, the best way is to use a proper parser. In this case, the original Markdown is a Perl script itself, and the regex extracted from it looks like the following (modified slightly to put it into qr// form). Of course, now there are tons of different Markdown variations and parsers, and probably several that are more robust than this "just a regex" parser, but as long as your Markdown input is simple enough, it should probably be ok. <update> There are caveats to this approach, though - for example, the following regex will also operate on code blocks! As usual, more representative sample data will result in more accurate solutions :-) </update>

    my $g_nested_brackets; $g_nested_brackets = qr{ (?> # Atomic matching [^\[\]]+ # Anything other than brackets | \[ (??{ $g_nested_brackets }) # Recursive set of nested brackets \] )* }x; my $anchors = qr{ ( # wrap whole match in $1 \[ ($g_nested_brackets) # link text = $2 \] \( # literal paren [ \t]* <?(.*?)>? # href = $3 [ \t]* ( # $4 (['"]) # quote char = $5 (.*?) # Title = $6 \5 # matching quote )? # title is optional \) ) }xs;

    I've taken this regex and modified it to modernize it a bit and only capture the things we're interested in:

    use warnings; use strict; my $anchors = qr{ (?(DEFINE) (?<nested_brackets> (?> [^\[\]]+ | \[ (?&nested_brackets) \] )* ) ) \[ (?<text> (?&nested_brackets) ) \] \( (?<link> [ \t]* <? .*? >? [ \t]* (?: (?<titlequote>['"]) .*? \k<titlequote> )? ) \) }xs; my $input = <<'END'; blah blah [click me](click me) more stuff blah [link here](link here) blah blah END my $expect = <<'END'; blah blah [click me](/click-me) more stuff blah [link here](/link-here) blah blah END (my $output = $input) =~ s{$anchors}{ my ($t, $l) = @+{qw/ text link /}; $l =~ s/\s+/-/g; "[$t](/$l)" }ge; use Test::More tests=>1; is $output, $expect;

      Holy cow, there's some really advanced regex stuff going on here I've never seen before. I will study this closely and pick up some new tricks. Thanks.

      I thought about researching a markdown parser but I figured it was easier, for now, just to roll my own before going down a big wormhole.

      $PM = "Perl Monk's";
      $MCF = "Most Clueless Friar Abbot Bishop Pontiff Deacon Curate Priest Vicar";
      $nysus = $PM . ' ' . $MCF;
      Click here if you love Perl Monks

Re: Modifying muliple matched strings in text
by LanX (Sage) on Jul 12, 2020 at 13:00 UTC
    Hint:

    Call a function in the substitution part (activated by /e modifier) which returns your transformation.

    s/PATTERN/replace($1,$2)/eg

    Not sure if you even need to pass the matchgroups $1,... into the sub, but it's certainly more readable.

    Edit

    Pattern should match the links [$1]($2) (pseudo syntax).

    I recommend using the /x flag for readability.

    The transformations of $2 should happen inside the replace() sub

    HTH!

    Cheers Rolf
    (addicted to the Perl Programming Language :)
    Wikisyntax for the Monastery

      Very cool. Didn't know that was even possible (well probably read about it once and then forgot). Thanks.

      $PM = "Perl Monk's";
      $MCF = "Most Clueless Friar Abbot Bishop Pontiff Deacon Curate Priest Vicar";
      $nysus = $PM . ' ' . $MCF;
      Click here if you love Perl Monks

Re: Modifying muliple matched strings in text
by tybalt89 (Prior) on Jul 12, 2020 at 15:01 UTC
    #!/usr/bin/perl use strict; # https://perlmonks.org/?node_id=11119206 use warnings; my $text = "blah blah [click me](click me) more stuff\nblah [link here +](link here) blah blah"; s!]\(\K.*?\)! '/' . $& =~ tr~ ~-~r !ge for $text; print "$text\n";