(Ovid) Re: Searching for web sites

You may wish to check out the HTML::FromText module. It will, amongst other things, automatically convert URLs to hyperlinks. I've never worked with .plan files, so I can't say for certain whether this is an appropriate solution, but I suspect that it's a good place to start.

Also, if you wish to do it by hand, switching to a different delimeter on your regexes will help you avoid backslashitis. Further, if your URLs are not broken across lines (i.e., if they don't have embedded newline) or have spaces, your could try the following (untested) regex as a starting point for conversion:

$newline =~ s#(http://[^.]+\.[^.]+\S+)#<a href="$1">$1</a>#gi;
[download]

The above regex assumes that, at minimum, you will have two groups to characters separated by a period after the http:// portion. The negated character classes should actually be replaced by classes that state allowable characters (and if you really want to be anal, I recall that the first allowable character in a domain is different from other allowable characters, but sometimes I get into regex overkill).

Cheers,
Ovid

Join the Perlmonks Setiathome Group or just go the the link and check out our stats.

Comment on (Ovid) Re: Searching for web sites Download Code

Replies are listed 'Best First'.
RE: (Ovid) Re: Searching for web sites by electronicMacks (Beadle) on Oct 25, 2000 at 03:53 UTC
If you’re using such a through regex that checks for dots and allowable characters, you may wish to ditch the http:// completely. People are more likely to list websites in their .plan files without it (for example, I visit perlmonks.org and not I visit http://www.perlmonks.org) Personally I’d feel safe putting anchor tags around anything that looks like xxx.xxx, although you could also include a list of allowable Top Level Domains, something like `@TLDs = ("com","net", "org", "edu","us","nl","de","it","se","ch","uk","ca","hr","ae","br","jp","be","us","au","ie","ar","fi","mil","gov","sg","es","mx","no","pt","dk","il","ru","nz","th","pl","id","cy","in","kw","at","za","cn","fr","is","ro","kr","gr","co","ph","bo","hu","cr","pe","cl","tr","arpa","tw","eg","ee","ge","ua","om","ec","hk","ve","ag","cz","ni","to","nu","sm","ni","lt","yu","bg","ba","do","qa","ck","mt","bf","lu","su","bh");`	[reply] [d/l]
RE: RE: (Ovid) Re: Searching for web sites by mirod (Canon) on Oct 25, 2000 at 15:39 UTC
Isn't this a little dangerous? Any time new TLD's are added you will need to go and change the list, plus I cannot see .cx, home of a bunch of free software projects in this list. http:// or at least www(\..+)+\.\w+ seem the safest matches	[reply]
RE: RE: RE: (Ovid) Re: Searching for web sites by FouRPlaY (Monk) on Oct 25, 2000 at 20:47 UTC
Lets not forget either that InterNIC just released the .god domain.	[reply]


Syntactic Confectionery Delight
	PerlMonks