http://qs321.pair.com?node_id=39217

electronicMacks has asked for the wisdom of the Perl Monks concerning the following question:

I’m maintaining a program that (among other things) reads a users input and, if necessary, inserts the appropriate <a href= > tag. Currently it uses the following regex:

s/((http|ftp|news):\/\/\S*)/<a href=”$1\””>$1<\/a>/g;

The main problem is that this regex will place tags around all urls. This creates problems if the user has already hand coded the tags in their input, such as <a href = http://www.perlmonks.org>Click here</a>

Can any Monks suggest a regular expression that will insert href tags around naked urls, but leave urls within tags alone?

Replies are listed 'Best First'.
Re: Adding href where needed.
by runrig (Abbot) on Oct 31, 2000 at 04:53 UTC
    You can try a negative lookbehind assertion (in perlman:perlre), but it has to be a fixed length pattern, and so won't work if there's spaces around the '=' (so you can first fix that):
    s/href\s*=\s*/href=/ig; s,(?<!(?i:href=))((?:http|ftp|news)://\S*),<a href="$1">$1</a>,g; You might try to think of something besides '\S*' also, in case the user puts in a period after the href, or puts it between parenthesis or ???.
    Update: Actually, if the user DID already make an href, they may or may not have put quotes around the '=' in the url, so maybe the easiest thing to do is rewrite all urls:
    # Now its starting to get ugly! $str =~ s#((?i:<a\s+href\s*=\s*"?)?(?:http|ftp|news)://[^"]+)#substr($ +1,0,1) eq '<' ? $1 : qq!<a href="$1" >$1</a>!#eg;
    I think that's right now (and you still may want to replace that '[^"]' with something else) :-)

    Actually, even that may not work if there's other attributes besides 'href' in the tag, but I'm done. The rest is up to you :-)
Re: Adding href where needed.
by moen (Hermit) on Oct 31, 2000 at 05:29 UTC
    You should maybe look into URI::Heuristic for completing your links. Something like this:
    #!/usr/bin/perl use strict; use URI::Heuristic qw(uf_uristr); my $url = uf_uristr("www.perlmonks.com"); print "<a href=$url>$url</a>\n";
    If the user input is www.perlmonks.com or http://www.perlmonks.com it will give you the proper URI. In this case with html code:
    <a href=http://www.perlmonks.com>http://www.perlmonks.com</a>
    using www.perlmonks.com as input.
    It also does ftp and others.

    ok..it's a bit off what you really asked for but i just noted this module at CPAN and liked it ;o)

    Update:
    #!/usr/bin/perl use URI::Heuristic qw(uf_uristr); use strict; my $url; my $input = "<a href = http://www.perlmonks.com>perlmonks</a>"; #my $input = "www.perlmonks.com"; if ($input =~ /^</) { $input =~ /<a\s+href\s*=\s*(.*)>.*<\/a>/i; $url = uf_uristr("$1"); } else { $url = uf_uristr("$input"); } print "<a href=$url>$url</a>";
    This works just fine, also with resolving the url to an uri. You may also drag out the text by enclosing the last .* and drag out the value of $2.
    The regex should work on it's own without using the module also, me thinks.
Re: Adding href where needed.
by FouRPlaY (Monk) on Oct 31, 2000 at 05:28 UTC
    How about just adding (?:<a href = \")? and (?:<\/a>)? at the begining and end (respectively) of your regexp?

    That's the best suggestion I can give with my Perl knowledge.



    FouRPlaY
    Learning Perl or Going To Die Trying
Re: Adding href where needed.
by Anonymous Monk on Mar 26, 2002 at 15:42 UTC
    my $newcontent = _translateLinks2($initcontent); # sub _translateLinks2 { my $content = shift; my @all = split(/ /,$content); my $skip = 0; my @trans; for my $a (@all) { # print "{ $a }\n"; if ($a =~ m#<a#i) { $skip = 1; } # print "skip : $skip\n"; if ($skip == 0 && $a =~ m#http#i) { if ($a =~ m#^(\n|\r)#) { $a =~ s/\n|\r//gis; $a = qq(\n<a href="$a">$a</a>); } elsif ($a =~ m#(\n|\r)$#) { $a =~ s/\n|\r//gis; $a = qq(<a href="$a">$a</a>\n); } else { $a = qq(<a href="$a">$a</a>); } } elsif ($skip == 0 && $a =~ m#\w+\.com|\w+\.net|\w+\.org|\w+\.gov#i +) { if ($a =~ m#^(\n|\r)#) { $a =~ s/\n|\r//gis; $a = qq(\n<a href="http://$a">http://$a</a>); } elsif ($a =~ m#(\n|\r)$#) { $a =~ s/\n|\r//gis; $a = qq(<a href="http://$a">http://$a</a>\n); } else { $a = qq(<a href="http://$a">http://$a</a>); } } # print "[[ $a ]]\n"; if ($a =~ m#<\/a>#i) { $skip = 0; } push (@trans,$a); } my $translated = join (' ',@trans); return $translated; }