Beefy Boxes and Bandwidth Generously Provided by pair Networks
No such thing as a small change
 
PerlMonks  

replacing text file separators in regex

by Anonymous Monk
on Jul 05, 2002 at 19:59 UTC ( [id://179725]=perlquestion: print w/replies, xml ) Need Help??

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

This relates to the question I asked yesterday at Net::FTP - syntax with username containing / - since it turns out it's a separate yet related issue than what I thought it was, here goes:

The sub-routine I'm still having a problem with is:

sub ParseURL { my($url)= shift; my($protocol, $site, $login, $password); $url=~ /^(\w+):\/*([^\/]+)\/(.*)$/; # chunk the URL components $protocol= uc($1); # preserve matches in real variables $site= $2; $path= $3; if ($site=~ /\@/) { # check for the optional login field ($login, $site)= split("\@", $site); ($login, $password)= # check for the optional password field split(":", $login); return($protocol, $site, $login, $password, $path); } else { # no account information return($protocol, $site, "", "", $path); } }

Basically my ftp username contains a / (XX/yyyy), and XX is getting parsed as the hostname of the ftp site.

As was suggested, I think the fastest way to fix this is to change the item separator in the text file to a comma or something, rather than a /.

The regex I'm having trouble with is:

$url=~ /^(\w+):\/*([^\/]+)\/(.*)$/;

The way I'm reading this is: between  /^(\w+) is $protocol, between  \/*([^\/]+) is $site, and  \/(.*)$ is $path.

So, based on the above, if I use a comma separator for the $site portion, the line should look like:

$url=~ /^(\w+):\/*([^\/]+),(.*)$/;

and the appropriate line in my text file looks like: ftp://XX/yyyy:password@123.456.789.012,/ However, when I then run the script, I get several "uninitalized value" errors, one for the pattern match ($site=~ /\@/), one for a line trying to use $host, the other for a line trying to use $path.

So, it seems I'm missing something (hopefully) fairly obvious, but I can't seem to figure it out.

Any help appreciated.

Thanks,

Glenn

Replies are listed 'Best First'.
Re: replacing text file separators in regex
by ehdonhon (Curate) on Jul 05, 2002 at 20:11 UTC
    How about this?
    $url=~ m|^(\w+):/*((.*\@)?[^/]+)/(.*)$|; $protocol= uc($1); $site= $2; $path= $4;

    That would accept everything up till the last '@' as part of the site regardless of any '/' characters that it might contain.

    UPDATE: Corrected the '@' that needed escaped
Re: replacing text file separators in regex
by perlkid (Novice) on Jul 06, 2002 at 04:31 UTC
    I would think, the following RE works with your case.
    
    #Given that a comma separator is used between site and path
    #$2 may or may not contain / or @ 
    
    $url=~ /^(\w+):\/\/(.*),(.*)$/ ;
    
    
    I assumed that your data looks like one of the following cases
    
    ftp://XX/yyy:password@123.123.123.123,/
    ftp://site,path
    ftp://site,
    
    For the "uninitailized value" warning use the defined fcn,
    
    for example:
    if (defined($site =~ /\@/)) { .. } else { .. }
    
    -perlkid
    
Re: replacing text file separators in regex
by fglock (Vicar) on Jul 05, 2002 at 20:16 UTC

    [^\/] means "not a slash". You might try something like "find the at-sign, then the slash":

    # untested $url=~ /^(\w+):\/*(.*?\@[^\/]+)\/(.*)$/;

      The at-sign might not exist at all. If that is the case, your regexp will not match.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://179725]
Approved by ehdonhon
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others romping around the Monastery: (4)
As of 2024-04-25 05:01 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found