Beefy Boxes and Bandwidth Generously Provided by pair Networks
Pathologically Eclectic Rubbish Lister
 
PerlMonks  

CGI fails to urlencode & chars in outbound url's

by BrowserUk (Patriarch)
on Jun 12, 2002 at 10:36 UTC ( [id://173755]=perlquestion: print w/replies, xml ) Need Help??

BrowserUk has asked for the wisdom of the Perl Monks concerning the following question:

In the following line, @list contains relative paths read from the underlying filesystem using readdir

print table( Tr[ map { td[ a( {-href=>"http://www.somedomain/sc +ript.pl?path=$path/$_" ) }, $_ ) ] } @list ] );

The problem is that some of these directories can contain spaces and or & characters (and others no doubt but I haven't fallen foul of these yet!).

CGI correctly urlencodes (on output) the spaces as %20 but doesn't encode the & as %26 which means that when the url is parsed on input, the & is being treated as a search parameter seperator. Meaning that in the URL:

http://www.somedomain/script.pl?Range=./things/Chalk%20&%20Cheese/

param() returns 'Chalk ' instead of 'Chalk & Cheese'

Is there some way of forcing CGI to encode the & as %26?

Is there a urlencode() function I could use?

Assuming there is a urlencode(), how would I incorporate this with the line above so as to have the map operate it upon each of the parameters?

My attempts at searches of local and online docs have so far not turned up a solution to this.

Thanks.

Replies are listed 'Best First'.
Re: CGI fails to urlencode & chars in outbound url's
by cjf (Parson) on Jun 12, 2002 at 10:42 UTC

      Thanks. Exactly what I needed.

      Without going into long explanations of Perl internals, is there a reason this has been implemented as:

      use URI::Escape; $encoded = uri_escape($unsafe_uri);

      instead of as (what seems to me would more intuative (and easier to find))

      use URL qw( urlencode ); $encoded = urlencode( $unsafe_uri );

      Or is this a simple case of "Thats the way the author chose to do it"?

        Or is this a simple case of "Thats the way the author chose to do it"?

        Pretty much. The author of a module can name it whatever they want to. Whether or not CPAN allows them to upload it under that name is a different issue (you can probably find the answer in The CPAN Faq).

        As for why the URI module was named URI instead of URL, there is a slight difference between the two. More information on this is available here.

Re: CGI fails to urlencode & chars in outbound url's
by projekt21 (Friar) on Jun 12, 2002 at 11:45 UTC

    You also may use CGI.pm's own escape function:

    use CGI; print CGI::escape("Chalk & Cheese"); # prints: Chalk%20%26%20Cheese

    alex pleiner <alex@zeitform.de>
    zeitform Internet Dienste

      I have now found a CGI::escapeHTML() function in the perldoc CGI, but not a CGI:escape()? Your snippet works (for me also) though so it is obviously there, I just can't find any docs to it.

      The perldoc CGI suggests that escapeHTML() will (often automatically if autoescaping is on, which it is by default and I haven't changed it) handle the conversion of & to &amp;, but this contradicts the evidence I am seeing - which would cause me to re-evaluate the evidence except that:

      1) I can see that the spaces are being escaped to %20, but the & stays resolutely unchanged.

      2) Adding URI::uri_escape() around $path/$_ in the original line, cures my problem.

      However, escapeHTML seems to be dependant (I haven't understood the docs fully yet) upon having or using character set of ISO-8859-1?

      I'm passing this along incase this is something that isn't confined to just my system/OS.

        CGI::escape comes from CGI::Util and is not documented (the code is the documentation :-). It is used within CGI.pm and is usable outside, too.

        #### from CGI/Util.pm sub escape { shift() if ref($_[0]) || (defined $_[1] && $_[0] eq $CGI::DefaultCla +ss); my $toencode = shift; return undef unless defined($toencode); $EBCDIC = "\t" ne "\011"; if ($EBCDIC) { $toencode=~s/([^a-zA-Z0-9_.-])/uc sprintf("%%%02x",$E2A[ord($1)] +)/eg; } else { $toencode=~s/([^a-zA-Z0-9_.-])/uc sprintf("%%%02x",ord($1))/eg; } return $toencode; }

        For more info see Dump a directory as links from CGI.

        escapeHTML is fine to produce HTML, but not useful for URIs (your "&" is the best example). If we have an unescaped "&" then this is a parameter delimiter.

        alex pleiner <alex@zeitform.de>
        zeitform Internet Dienste

      You also may use CGI.pm's own escape function:

      This is probably irrelevant to this discussion. The a() function is being used, which already escapes its arguments. Or at least CGI.pm version 2.752 does.

      2;0 juerd@ouranos:~$ perl -MCGI=a -le'print a({ -href => "&&&" }, "asd +f")' <a href="&amp;&amp;&amp;">asdf</a>

      - Yes, I reinvent wheels.
      - Spam: Visit eurotraQ.
      

        But that's exactly what nobody wants, as the & of &amp; is treated as parameter delimiter.

        #!/usr/bin/perl -w use strict; use CGI; my $q = new CGI; print $q->header; print "<pre>\n"; print $_, "=", $q->param($_), "\n" for $q->param; print "\n</pre>\n";
        prints
        foo=bar amp= baz=
        for
        http://electra.igd.fhg.de/cgi-bin/test3.pl?foo=bar&amp;baz
        but it should print
        foo=bar&baz
        You need to use CGI::escape or URI::Escape but not CGI::escapeHTML.

        Update: Just for completeness: a() calls CGI::Util::make_attributes on each attribute, the latter calls CGI::Util::simple_escape for escaping ("&"->"&amp;" and some others). CGI::Util::escape does something different (see my last post).

        alex pleiner <alex@zeitform.de>
        zeitform Internet Dienste

Re: CGI fails to urlencode & chars in outbound url's
by Juerd (Abbot) on Jun 12, 2002 at 11:02 UTC

    http://www.somedomain/script.pl?Range=./things/Chalk%20&amp;%20Cheese/

    &amp; in html is the ampersand character. That is also true for within href attributes. If you don't believe me, have a look at this link: <a href="/&amp;&amp;&amp;">. Your code works. It is probably the space that confuses your script.pl.

    - Yes, I reinvent wheels.
    - Spam: Visit eurotraQ.
    

      Sorry! I forgot that I don't need to entity encrypt the & within the <code> sections.

      It is definately the & that is (was! Using uri_encode() fixed it) the problem.

      If I can work out how I edit the node, I will correct that!

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://173755]
Approved by broquaint
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others having a coffee break in the Monastery: (3)
As of 2024-04-26 04:59 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found