Regexes can parse many forms of URLs, including the most
common ones. Here's a regex for HTTP URIs:
(?:(?:http)://(?:(?:(?:(?:(?:(?:[a-zA-Z0-9][-a-zA-Z0-9]*)?[a-zA-Z0-9])
+[.])*(?:[a
-zA-Z][-a-zA-Z0-9]*[a-zA-Z0-9]|[a-zA-Z])[.]?)|(?:[0-9]+[.][0-9]+[.][0-
+9]+[.][0-9
]+)))(?::(?:(?:[0-9]*)))?(?:/(?:(?:(?:(?:(?:(?:[a-zA-Z0-9\-_.!~*'():@&
+=+$,]+|(?:
%[a-fA-F0-9][a-fA-F0-9]))*)(?:;(?:(?:[a-zA-Z0-9\-_.!~*'():@&=+$,]+|(?:
+%[a-fA-F0-
9][a-fA-F0-9]))*))*)(?:/(?:(?:(?:[a-zA-Z0-9\-_.!~*'():@&=+$,]+|(?:%[a-
+fA-F0-9][a
-fA-F0-9]))*)(?:;(?:(?:[a-zA-Z0-9\-_.!~*'():@&=+$,]+|(?:%[a-fA-F0-9][a
+-fA-F0-9])
)*))*))*))(?:[?](?:(?:(?:[;/?:@&=+$,a-zA-Z0-9\-_.!~*'()]+|(?:%[a-fA-F0
+-9][a-fA-F
0-9]))*)))?))?)
Alternatively, you may want to use the Regexp::Common module:
use Regexp::Common;
print $&, "\n" while $txt =~ /$RE{URI}/g;
Abigail |