Re^5: Grab input from the user and Open the file

Hi valerydolce,

Designing your own regex is certainly very good practice. For a production system I'd recommend existing modules, for example Regexp::Common to find URIs and URI to parse them. For example:

use warnings;
use strict;

my $str = <<'END_STR';
I am an example http://www.perlmonks.org/?parent=1176663;node_id=3333 
+text
that contains <https://perlmonks.pair.com/?node_id=1176651> two URIs
END_STR

use Regexp::Common qw/URI/;
use URI;

while ($str=~/$RE{URI}{-keep}/g) {
    my $uri = URI->new($1);
    print "$uri\n";
    print "  Scheme: ", $uri->scheme, "\n";
    print "    Host: ", $uri->host,   "\n";
    print "    Path: ", $uri->path,   "\n";
    print "   Query: ", $uri->query,  "\n";
}
[download]

See the URI documentation for lots more ways to access the different parts of the URI. I did notice that unfortunately Regexp::Common apparently doesn't match the #fragment part of the URI, so here's an attempt at an alternate solution, using a regex based on the characters allowed in URIs from RFC 3986.

# NOTE this is based on a quick skim of RFC 3986 and may not be comple
+te!
my $url_re = qr{
    # https://tools.ietf.org/html/rfc3986#section-2
    # URI = scheme ":" hier-part ...;  hier-part = "//" ...
    # scheme = ALPHA *( ALPHA / DIGIT / "+" / "-" / "." )
    [A-Za-z][A-Za-z0-9+\-.]* ://
    # gen-delims = ":" / "/" / "?" / "#" / "[" / "]" / "@"
    # sub-delims = "!" / "$" / "&" / "'" / "(" / ")"
    #     / "*" / "+" / "," / ";" / "="
    # unreserved = ALPHA / DIGIT / "-" / "." / "_" / "~"
    ( [:/?#\[\]@!\$&'()*+,;=A-Za-z0-9\-._~]
    # pct-encoded = "%" HEXDIG HEXDIG
    | %[0-9A-Fa-f]{2} )*
}x;

while ($str=~/($url_re)/g) {
    my $uri = URI->new($1);
    print "$uri\n";
}
[download]

Hope this helps,
-- Hauke D

Comment on Re^5: Grab input from the user and Open the file Select or Download Code


Just another Perl shrine
	PerlMonks