http://qs321.pair.com?node_id=1066685

bangor has asked for the wisdom of the Perl Monks concerning the following question:

Hi monks, I have a CGI script which is called with a trailing path:
http://www.example.com/script.pl/foo/bar
To get the individual elements of the path I use a subroutine
sub get_paths_from_url { my $path = $ENV{'PATH_INFO'}; return if not $path; $path =~ s|^/||; $path =~ s|/$||; my @paths = split('/',$path); return \@paths; }
I'm a bit wary of this as I'm sure I've read that parsing anything from the web is error prone. My question is - is there anything inherently dangerous about this approach, and should I just use a module?

Replies are listed 'Best First'.
Re: Accessing %ENV directly in script
by afoken (Chancellor) on Dec 11, 2013 at 20:15 UTC

    Just a minor note: Bad people often insert ".." into URLs, sometimes encoded, sometimes plain. See http://en.wikipedia.org/wiki/Directory_traversal_attack. As long as you use @paths just as a way to pass parameters to your script, this may be harmless. But as soon as you construct a filename from @paths and and a prefix, those bad people may gain access to files that were not meant to be accessible via the web. Also consider replacing backslashes with forward slashes (some people simply can't see a difference between them) (tr|\\|/|) and collapsing multiple slashes to single slashes (s|/+|/|g) before splitting.

    Alexander

    --
    Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)
      Hi Alexander. I hadn't thought of people constructing the URL by hand (if that's what you mean) but then that's exactly what I do myself when testing. So fixing the slashes is designed to forgive errors here. I'm not too worried about rogue elements in @paths as any element in there must match exactly a key in a hardcoded hash otherwise it's ignored.
Re: Accessing %ENV directly in script
by educated_foo (Vicar) on Dec 11, 2013 at 20:19 UTC
    Doing things with stuff is error-prone, but fortunately you don't catch cooties just from touching things that have been on the internet...

    Just don't make any assumptions about it -- it's just some bytes. If you expect a string with some maximum length, or without certain characters, verify it. In your case, you're just looking at /-separated bits of URL-encoded text, so it's not a big deal.

Re: Accessing %ENV directly in script
by taint (Chaplain) on Dec 11, 2013 at 19:46 UTC

    Aside from not "sanitizing" the calls, there shouldn't be. There's nothing error-proof, or completely safe. When exposing yourself to the internet.

    Your post reminded me of a WebLog project on sourceforge.net, called blosxom. While the project seems somewhat abandoned. It uses the same URI scheme you're attempting to use. It may reveal some useful bits for implementation of yours. I'll take another look at it, and update my reply. Should I find anything to add.

    --Chris

    Yes. What say about me, is true.