Beefy Boxes and Bandwidth Generously Provided by pair Networks
Welcome to the Monastery
 
PerlMonks  

matching substring in url

by cioperl (Novice)
on Nov 08, 2019 at 18:17 UTC ( [id://11108485]=perlquestion: print w/replies, xml ) Need Help??

cioperl has asked for the wisdom of the Perl Monks concerning the following question:

I'm struggling to capture a portion from URL string (using shell script) like this:

XXX="/blah1/blah2/blah3/1234567890;arg=AAA123BBB456CCC" # alphanumeric: ^^^^^^^^^^ # optional: ^^^^^^^^^^^^^^^^^^^^ # fixed string: ^^^^^ # alphanumeric: ^^^^^^^^^^^^^^^ echo "$XXX" | perl -n -e 'if (m[^/.+/([^/\?\s]+)(?:;arg=)?]) {print "$ +1\n";}'

The part I need to capture is: "1234567890" (alphanumeric string)
The trailing ";arg=AAA123BBB456CCC" is optional (may or may not appear in the string).

What I'm getting instead is:
1234567890;arg=AAA123BBB456CCC

Thanks for any help with this.


SOLVED:
Just added the ";" to the non-matching list:
echo "$XXX" | perl -n -e 'if (m[^/.+/([^/\?\s;]+)(?:;arg=)?]) {print " +$1\n";}' ^


Thanks for all the suggestions! I'll definitely look into URI for help with parsing.

Replies are listed 'Best First'.
Re: matching substring in url
by choroba (Cardinal) on Nov 08, 2019 at 19:22 UTC
    I though I'd use URI, but the "query" part is usually separated by ? instead of ;. Fortunately, the semicolon also works:
    #!/usr/bin/perl use warnings; use strict; use URI; use Test::More tests => 2; my %expect = ( '/blah1/blah2/blah3/1234567890;arg=AAA123BBB456CCC' => '1234567890 +', '/blah1/blah2/blah3/1234567890' => '1234567890'); for my $string (keys %expect) { my $uri = ('URI'->new($string)->path_segments)[-1]; is $uri, $expect{$string}; }
    map{substr$_->[0],$_->[1]||0,1}[\*||{},3],[[]],[ref qr-1,-,-1],[{}],[sub{}^*ARGV,3]
      thanks. I'm not familiar with URI, but looks like it's the way to go.
Re: matching substring in url
by haukex (Archbishop) on Nov 08, 2019 at 19:24 UTC
      thanks guys!
Re: matching substring in url
by roboticus (Chancellor) on Nov 09, 2019 at 19:11 UTC

    ciaoperl:

    Yet another way you could do it:

    echo "$XXX" | perl -pe '$_=(split "/", (split ";")[0])[-1]'

    It might look a little magical at first, but it's actually pretty simple. Instead of using a regular expression, I first split the string on ";" and kept only the first bit ((split ";")[0]) to chop off the optional bits. Then I split that first bit on "/" and kept the last part ((split "/", ...)[-1]) to get the value you wanted. Finally, I assigned the result to $_ and used the -p option to print the value.

    I got there essentially by starting with something like this:

    my $url = "/bla/bla/123;yarg"; my @tmp = split ";", $url; # break string at semicolons @tmp = split "/", $tmp[0]; # break first chunk apart at "/" print $tmp[-1]; # Print the value we want

    Since you can extract a chunk of a list by subscripting it, I first contracted the above to:

    my $url = "/bla/bla/123;yarg"; # Split on ";", keep only the first chunk, then split on "/" my @tmp = split "/", (split ";", $url)[0]; # The last item holds the value we want print $tmp[-1];

    Then I wanted to get rid of the temporary array by keeping *only* the final result by again subscripting the resulting list:

    my $url = "/bla/bla/123;yarg"; $url = (split "/", (split ";", $url)[0])[-1]; print $url;

    Then to turn it into a one liner, I changed it to this:

    perl -e '$a="/bla/bla/123;yarg"; print (split "/", (split ";", $a)[0]) +[-1]' print (...)_ interpreted as function at -e line 1. syntax error at -e line 1, near ")[" execution of -e aborted due to compilation errors.

    Oops! The parser sees "print (" and thinks we're wrapping the print arguments in parenthesis, leading to a syntax error later. I could've fixed the problem by putting a unary "+" on the thing to print, as that tells perl that the parenthesis isn't part of print:

    perl -e '$a="/bla/bla/123;yarg"; print +(split "/", (split ";", $a)[0] +)[-1]' 123

    But I went a different route: since you're using echo to inject the value into perl as the value $_, I instead elected to use the -p option which tells perl to print the value in $_ after the code executes, giving me the bit of code I suggested.

    ...roboticus

    When your only tool is a hammer, all problems look like your thumb.

Re: matching substring in url
by jcb (Parson) on Nov 09, 2019 at 00:50 UTC

    Sometimes Perl is not the right tool for the job. You are starting an entire instance of perl just for a regex match, when sed will do:

    echo "$XXX" | sed -e 's/;arg=.*$//' -e 's!^.*/!!'

    On my box, simply running perl -e 1 takes longer than the above sed command. For a shell script, these differences matter.

    Perl oneliners are great at the prompt, but if you are putting them into a shell script, you should probably move the entire script to Perl.

      you have a good point jcb. For simple one-liners sed will do. Thanks!
Re: matching substring in url
by hippo (Bishop) on Nov 08, 2019 at 18:23 UTC
    echo "$XXX" | perl -ne '/\/(\w+);/ and print $1'
      Unfortunately, semicolon ";" is not guaranteed, so this won't work -- the part starting with (and including) ";" may or may not appear in the url.
      Any other suggestions?

        In that case:

        echo "$XXX" | perl -ne '/\/(\w+)[^\/]*?$/ and print $1'

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://11108485]
Approved by Discipulus
Front-paged by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others learning in the Monastery: (2)
As of 2024-04-19 00:06 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found