Though it seems to work, I have a hunch this isn't ideal.
Of course, the best way is to use a proper parser. In this case, the original Markdown is a Perl script itself, and the regex extracted from it looks like the following (modified slightly to put it into qr// form). Of course, now there are tons of different Markdown variations and parsers, and probably several that are more robust than this "just a regex" parser, but as long as your Markdown input is simple enough, it should probably be ok. <update> There are caveats to this approach, though - for example, the following regex will also operate on code blocks! As usual, more representative sample data will result in more accurate solutions :-) </update>
my $g_nested_brackets;
$g_nested_brackets = qr{
(?> # Atomic matching
[^\[\]]+ # Anything other than brackets
|
\[
(??{ $g_nested_brackets }) # Recursive set of nested brackets
\]
)*
}x;
my $anchors = qr{
( # wrap whole match in $1
\[
($g_nested_brackets) # link text = $2
\]
\( # literal paren
[ \t]*
<?(.*?)>? # href = $3
[ \t]*
( # $4
(['"]) # quote char = $5
(.*?) # Title = $6
\5 # matching quote
)? # title is optional
\)
)
}xs;
I've taken this regex and modified it to modernize it a bit and only capture the things we're interested in:
use warnings;
use strict;
my $anchors = qr{
(?(DEFINE) (?<nested_brackets>
(?> [^\[\]]+ | \[ (?&nested_brackets) \] )*
) )
\[ (?<text> (?&nested_brackets) ) \]
\( (?<link>
[ \t]* <? .*? >? [ \t]*
(?: (?<titlequote>['"]) .*? \k<titlequote> )?
) \)
}xs;
my $input = <<'END';
blah blah [click me](click me) more stuff
blah [link here](link here) blah blah
END
my $expect = <<'END';
blah blah [click me](/click-me) more stuff
blah [link here](/link-here) blah blah
END
(my $output = $input) =~
s{$anchors}{
my ($t, $l) = @+{qw/ text link /};
$l =~ s/\s+/-/g;
"[$t](/$l)"
}ge;
use Test::More tests=>1;
is $output, $expect;
-
Are you posting in the right place? Check out Where do I post X? to know for sure.
-
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big>
<blockquote> <br /> <dd>
<dl> <dt> <em> <font>
<h1> <h2> <h3> <h4>
<h5> <h6> <hr /> <i>
<li> <nbsp> <ol> <p>
<small> <strike> <strong>
<sub> <sup> <table>
<td> <th> <tr> <tt>
<u> <ul>
-
Snippets of code should be wrapped in
<code> tags not
<pre> tags. In fact, <pre>
tags should generally be avoided. If they must
be used, extreme care should be
taken to ensure that their contents do not
have long lines (<70 chars), in order to prevent
horizontal scrolling (and possible janitor
intervention).
-
Want more info? How to link
or How to display code and escape characters
are good places to start.
|