http://qs321.pair.com?node_id=59346

ryan has asked for the wisdom of the Perl Monks concerning the following question:

Hi, I can't seem to uncover any information about this.

More out of interest than necessity I have noticed sites, including ones such as merlyn's /cgi/go/ arrangement, appear to call CGI scripts by passing query strings after a '/' instead of a '?'

Is this possible to do in perl or is it an Apache hack? and/or am I using the wrong terminology and hence not getting any positive search reasults :)

Replies are listed 'Best First'.
Re: CGI queries without '?'
by japhy (Canon) on Feb 19, 2001 at 11:07 UTC
    Most web servers support the PATH_INFO behavior. If your web server can execute a program at http://www.foo.com/prog, then going to http://www.foo.com/prog/this/too will set $ENV{PATH_INFO} to "/this/too".

    japhy -- Perl and Regex Hacker
      Indeed it does.

      This being the case, can CGI.pm correctly handle such information gathering, or is it up to the programmer to risk it all and handle the variable on their own? ... (I'm not saying I condone this practice)

        IMHO it is better to use CGI.pm to access the Path Info string than to access $ENV{PATH_INFO} directly, with the CGI::path_info method. There's alot of good reasons to do this, here are some of mine:

        • CGI deals with implementation issues. If the structure of the %ENV hash ever changed, my code wouldn't break, assuming the module is kept up to date.
        • CGI::path_info corrects common problems in certain web servers, providing a more portable solution than direct access to $ENV{PATH_INFO}.
        • You get documentation of the CGI::path_info method to explain what it does. This means less documentation for me, I like that =) It's much more difficult to find docs explaining the %ENV hash well.
        • It just looks prettier.

        In general, anytime I need to access the %ENV hash, I try to look in CGI.pm's docs for a method to get at the data I want.

        ??? It has nothing to do with CGI.pm. The pathinfo will be in the %ENV hash.

        my $pathinfo = $ENV{PATH_INFO} || '';

        update: Good point, dkubb. Thanks. I didn't realize I could get the PATH_INFO from CGI itself.

Re: CGI queries without '?'
by lhoward (Vicar) on Feb 19, 2001 at 19:08 UTC
    I've done this kind of thing using Apache's mod_rewrite. This is not suitable for handling form submissions, bit if you're using it to build links between your pages you can hide the "CGI" nature of your pages entirely. Here's an apache config example:
    RewriteEngine on RewriteRule ^/foo/([0-9]+).html$ /cgi-bin/foo.cgi?id=$1 [pt]
    With this code the user will only see something like: /foo/54236.html in the HTML, but it will be transformed by Apache to /cgi-bin/foo.cgi?id=54236.
(dws) Warning Re: CGI queries without '?'
by dws (Chancellor) on Feb 20, 2001 at 00:25 UTC
    If you're writing web-based applications, (say, to tunnel some application specific protocol over HTTP), there's a really big gotcha! waiting for you if you pass parameters on the path after the CGI without also using parameters. You may not see this gotcha in development, only to discover it when your application gets out into the real world.

    The gotcha arises when there's a caching proxy server in the mix between your web server and the client browser.

    Caching proxy servers are getting more and more popular as corporations (and ISP) attempt to deal with increasing demands for bandwidth. If you can cache content close to where it's being demanded, it's a big win overall.

    That can really screw up naive web applications.

    Caching proxy servers base their decisions on whether or not to cache on a couple of (sometimes configurable) rules. (For example, the rules MS Proxy uses are described here.) One of those rules is often "are there CGI parameters?" If you pass application parameters on the path without also passing some after a ? (and without using another incantation*), the caching proxy may hang on to the web server's response and reissue it when next asked, rather than passing the request back to the web server. If the URL points to dynamic content, or if your application depends on seeing that URL to maintain state (such as a heartbeat to tell it that the client is still alive), you're hosed in a way that can be hard to debug.

    *Incantations to defeat caching proxy include the use of an Expires: or appropriate Cache-control: response header.

Re: CGI queries without '?'
by Masem (Monsignor) on Feb 19, 2001 at 18:16 UTC
    While others have pointed out how the PATH_INFO stuff works, I'll toss in my 2 cents to point out that beyond this, there is no way to get around using '?', '&' and '=' as the special characters for a cgi request, as defined by the CGI specification. As the browser has to create this string, and the web server able to interprete it, you have no way of changing it.
Re: CGI queries without '?'
by mischief (Hermit) on Feb 19, 2001 at 21:28 UTC
    This is a good trick to use when you want your site to be more likely to be indexed by search engines. Some robots don't follow links that have query strings on the end (ie, links with "?var=something"), so you can fool them by using the CGI environment variable PATH_INFO instead (however you choose to access it). You can also use it to make pages look half dynamic and half static for the search engines, to try and control what they index. For instance, if you've got a forum type script that has posts archived in the format "/forum/msgnum/", you can add bits at the end with the query string like "/forum/msgnum/?mode=threaded" or whatever (I think slashdot might do something like this, not entirely sure though).
Re: CGI queries without '?'
by tune (Curate) on Feb 19, 2001 at 21:18 UTC
    You can intercept the URL with modperl too, and do what you want with it. For example you can imitate that you have plenty of CGI pages (index.cgi, blahblah.cgi, etc.), and you only use it to find which function to call. Sweet, isn't it?
    Or you can do http://someserver/somepath/12,45,3434,221 etc. URL's too :)

    -- tune