Beefy Boxes and Bandwidth Generously Provided by pair Networks
go ahead... be a heretic
 
PerlMonks  

How to recognize url in text and convert to hyperlink, unless already in anchor

by Anonymous Monk
on Oct 11, 2004 at 22:18 UTC ( [id://398316]=perlquestion: print w/replies, xml ) Need Help??

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

I have a form that the user is allowed to type plain text and/or HTML into. I need to recognize urls within this text that have not already been wrapped in an anchor <A HREF> tag, and then wrap them so they end up as hyperlinks. If already wrapped, I want to leave them alone.

I have an expression that does this for the whole string:

$myformtext =~ s!(http://[^\s]+)!<a href="$1">$1</a>!gi;
But can I somehow apply this only to the portion of the text that is not within an <A HREF> tag? Maybe split it or something? I'm not a perl expert...

20041011 Edit by ysth: add p and code tags

Replies are listed 'Best First'.
Re: How to recognize url in text and convert to hyperlink, unless already in anchor
by pizza_milkshake (Monk) on Oct 11, 2004 at 23:19 UTC
    $ cat no_href.pl
    #!perl -l # ex: set ts=4: use strict; use warnings; use HTML::Parser; use URI; my (@tagstack, $BUF); sub start { # enter into tags my ($tag, $attr, $text) = @_; $tag .= " href" if ($tag eq "a" && defined $attr->{"href"}); push @tagstack, $tag; output($text); } sub end { # escape out of tags my ($tag, $text) = @_; shift @tagstack while (scalar @tagstack && $tagstack[0] ne $tag); shift @tagstack if scalar @tagstack; # actually nuke element we're + looking for output($text); } sub text { # handle everything inside and around tags my ($text) = @_; if (unlinked()) { # replace URLs with their linked equivalent if we're not withi +n a link $text =~ s{ \b(http://\S+) }{ "<a href=\"" . URI->new($1)->can +onical . "\">$1</a>" }gex; } output($text); } # are we inside a link right now? sub unlinked { return not scalar grep { /^a href$/ } @tagstack; } # add to output buffer sub output { $BUF .= shift @_; } # start code my $p = HTML::Parser->new( "start_h" => [ \&start, "tagname, attr, text" ] ,"end_h" => [ \&end, "tagname, text" ] ,"text_h" => [ \&text, "dtext" ] ); $p->parse(do{ local $/; <DATA> }); print $BUF; __DATA__ <a href="">http://linked1.com</a> <a style="" href='bob'>http://linked2.com</a> <a href="whatever">http://linked3.com</a> <a nolink>http://linked4.com</a> http://unlinked1 http://unlinked2.com
    $ perl no_href.pl
    <a href="">http://linked1.com</a> <a style="" href='bob'>http://linked2.com</a> <a href="whatever">http://linked3.com</a> <a nolink><a href="http://linked4.com/">http://linked4.com</a></a> <a href="http://unlinked1/">http://unlinked1</a> <a href="http://unlinked2.com/">http://unlinked2.com</a>

    perl -e"\$_=qq/nwdd\x7F^n\x7Flm{{llql0}qs\x14/;s/./chr(ord$&^30)/ge;print"

Re: How to recognize url in text and convert to hyperlink, unless already in anchor
by skx (Parson) on Oct 11, 2004 at 22:54 UTC

    All you need to do is look for the links as you are doing, but make sure that the links are preceeded, and optionally followed, by whitespace.

    This will never by true for something inside an A tag.

    (Yes the real solution is to use a package from CPAN for recognising URLS, and parsing, but this is a hack on your hack).

    You could use the following:

    $myformtext =~ s!(\s)(http://\w.*?)(\s)!$1<a href="$2">$2</a>$3!gm;
    Steve
    ---
    steve.org.uk

      This will never by true for something inside an A tag.

      Except when it is of course, like here: bamb

Re: How to recognize url in text and convert to hyperlink, unless already in anchor
by pingo (Hermit) on Oct 12, 2004 at 09:33 UTC
    Sounds like you need URI::Find, or am I missing something obvious?
Re: How to recognize url in text and convert to hyperlink, unless already in anchor
by DrHyde (Prior) on Oct 12, 2004 at 08:51 UTC
    Do it in two stages. First unwrap any URLs that the user has already wrapped in <a ...> ... </a> tags. Then wrap all URLs in <a ...> ... </a> tags.

    Alternatively, don't allow your users to put the tags in in the first place! This makes it easier to protect yourself and your users against craziness involving some of the other attributes of the <a> tag, like target, onclick and so on.

Re: How to recognize url in text and convert to hyperlink, unless already in anchor
by Anonymous Monk on Oct 13, 2004 at 11:07 UTC
    Use Regexp::Common:
    use Regexp::Common qw /URI/; my $re_a_tag = qr/<a\s+.*?>.*<\/a>/si ; my $html = q` some link: <a href="http://www.perl.com">www.perl.com</a> http://www.perlmonks.com by! `; my @chunks = split(/($re_a_tag)/si , $html) ; foreach my $chunks_i ( @chunks ) { next if $chunks_i =~ /$re_a_tag/ ; $chunks_i =~ s/($RE{URI}{HTTP})/<a href="$1">$1<\/a>/gsi ; } $html = join('' , @chunks) ; print "$html\n" ;
    Output:
    some link: <a href="http://www.perl.com">www.perl.com</a> <a href="http://www.perlmonks.com">http://www.perlmonks.com</a> by!
    Enjoy!

    By gmpassos

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://398316]
Approved by ysth
Front-paged by grinder
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others lurking in the Monastery: (5)
As of 2024-03-29 12:36 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found