CGI fails to urlencode & chars in outbound url's

BrowserUk has asked for the wisdom of the Perl Monks concerning the following question:

In the following line, @list contains relative paths read from the underlying filesystem using readdir

print        table( Tr[ map { td[ a( {-href=>"http://www.somedomain/sc
+ript.pl?path=$path/$_" )  }, $_ ) ] } @list ] );
[download]

The problem is that some of these directories can contain spaces and or & characters (and others no doubt but I haven't fallen foul of these yet!).

CGI correctly urlencodes (on output) the spaces as %20 but doesn't encode the & as %26 which means that when the url is parsed on input, the & is being treated as a search parameter seperator. Meaning that in the URL:

http://www.somedomain/script.pl?Range=./things/Chalk%20&amp;%20Cheese/
[download]

param() returns 'Chalk ' instead of 'Chalk & Cheese'

Is there some way of forcing CGI to encode the & as %26?

Is there a urlencode() function I could use?

Assuming there is a urlencode(), how would I incorporate this with the line above so as to have the map operate it upon each of the parameters?

My attempts at searches of local and online docs have so far not turned up a solution to this.

Thanks.

Comment on CGI fails to urlencode & chars in outbound url's Select or Download Code

Replies are listed 'Best First'.
Re: CGI fails to urlencode & chars in outbound url's by cjf (Parson) on Jun 12, 2002 at 10:42 UTC
Take a look at the uri_escape() method of URI::Escape.	[reply]
Re: Re: CGI fails to urlencode & chars in outbound url's by BrowserUk (Patriarch) on Jun 12, 2002 at 11:07 UTC
Thanks. Exactly what I needed. Without going into long explanations of Perl internals, is there a reason this has been implemented as: `use URI::Escape; $encoded = uri_escape($unsafe_uri);` [download] instead of as (what seems to me would more intuative (and easier to find)) `use URL qw( urlencode ); $encoded = urlencode( $unsafe_uri );` [download] Or is this a simple case of "Thats the way the author chose to do it"?	[reply] [d/l] [select]
Re(3): CGI fails to urlencode & chars in outbound url's by cjf (Parson) on Jun 12, 2002 at 11:37 UTC
Or is this a simple case of "Thats the way the author chose to do it"? Pretty much. The author of a module can name it whatever they want to. Whether or not CPAN allows them to upload it under that name is a different issue (you can probably find the answer in The CPAN Faq). As for why the URI module was named URI instead of URL, there is a slight difference between the two. More information on this is available here.	[reply]
Re: CGI fails to urlencode & chars in outbound url's by projekt21 (Friar) on Jun 12, 2002 at 11:45 UTC
You also may use CGI.pm's own escape function: `use CGI; print CGI::escape("Chalk & Cheese"); # prints: Chalk%20%26%20Cheese` [download] alex pleiner <alex@zeitform.de> zeitform Internet Dienste	[reply] [d/l]
Re: Re: CGI fails to urlencode & chars in outbound url's by BrowserUk (Patriarch) on Jun 12, 2002 at 12:22 UTC
I have now found a CGI::escapeHTML() function in the perldoc CGI, but not a CGI:escape()? Your snippet works (for me also) though so it is obviously there, I just can't find any docs to it. The perldoc CGI suggests that escapeHTML() will (often automatically if autoescaping is on, which it is by default and I haven't changed it) handle the conversion of `& to &`, but this contradicts the evidence I am seeing - which would cause me to re-evaluate the evidence except that: 1) I can see that the spaces are being escaped to %20, but the & stays resolutely unchanged. 2) Adding URI::uri_escape() around $path/$_ in the original line, cures my problem. However, escapeHTML seems to be dependant (I haven't understood the docs fully yet) upon having or using character set of ISO-8859-1? I'm passing this along incase this is something that isn't confined to just my system/OS.	[reply] [d/l]
Re: Re: Re: CGI fails to urlencode & chars in outbound url's by projekt21 (Friar) on Jun 12, 2002 at 15:39 UTC
CGI::escape comes from CGI::Util and is not documented (the code is the documentation :-). It is used within CGI.pm and is usable outside, too. `#### from CGI/Util.pm sub escape { shift() if ref($_[0]) \|\| (defined $_[1] && $_[0] eq $CGI::DefaultCla +ss); my $toencode = shift; return undef unless defined($toencode); $EBCDIC = "\t" ne "\011"; if ($EBCDIC) { $toencode=~s/([^a-zA-Z0-9_.-])/uc sprintf("%%%02x",$E2A[ord($1)] +)/eg; } else { $toencode=~s/([^a-zA-Z0-9_.-])/uc sprintf("%%%02x",ord($1))/eg; } return $toencode; }` [download] For more info see Dump a directory as links from CGI. escapeHTML is fine to produce HTML, but not useful for URIs (your "&" is the best example). If we have an unescaped "&" then this is a parameter delimiter. alex pleiner <alex@zeitform.de> zeitform Internet Dienste	[reply] [d/l]
Re: Re: Re: Re: CGI fails to urlencode & chars in outbound url's by BrowserUk (Patriarch) on Jun 12, 2002 at 17:01 UTC
Re^4: CGI fails to urlencode & chars in outbound url's by Anonymous Monk on Aug 01, 2005 at 13:10 UTC
Re: Re: CGI fails to urlencode & chars in outbound url's by Juerd (Abbot) on Jun 12, 2002 at 16:10 UTC
You also may use CGI.pm's own escape function: This is probably irrelevant to this discussion. The a() function is being used, which already escapes its arguments. Or at least CGI.pm version 2.752 does. `2;0 juerd@ouranos:~$ perl -MCGI=a -le'print a({ -href => "&&&" }, "asd +f")' <a href="&&&">asdf</a>` [download] - Yes, I reinvent wheels. - Spam: Visit eurotraQ.	[reply] [d/l]
Re: Re: Re: CGI fails to urlencode & chars in outbound url's by projekt21 (Friar) on Jun 12, 2002 at 16:27 UTC
But that's exactly what nobody wants, as the & of & is treated as parameter delimiter. `#!/usr/bin/perl -w use strict; use CGI; my $q = new CGI; print $q->header; print "<pre>\n"; print $_, "=", $q->param($_), "\n" for $q->param; print "\n</pre>\n";` [download] prints `foo=bar amp= baz=` [download] for `http://electra.igd.fhg.de/cgi-bin/test3.pl?foo=bar&baz` [download] but it should print `foo=bar&baz` [download] You need to use CGI::escape or URI::Escape but not CGI::escapeHTML. Update: Just for completeness: a() calls CGI::Util::make_attributes on each attribute, the latter calls CGI::Util::simple_escape for escaping ("&"->"&" and some others). CGI::Util::escape does something different (see my last post). alex pleiner <alex@zeitform.de> zeitform Internet Dienste	[reply] [d/l] [select]
Re: CGI fails to urlencode & chars in outbound url's by Juerd (Abbot) on Jun 12, 2002 at 11:02 UTC
`http://www.somedomain/script.pl?Range=./things/Chalk%20&%20Cheese/` [download] & in html is the ampersand character. That is also true for within href attributes. If you don't believe me, have a look at this link: `<a href="/&&&">`. Your code works. It is probably the space that confuses your script.pl. - Yes, I reinvent wheels. - Spam: Visit eurotraQ.	[reply] [d/l] [select]
Re: Re: CGI fails to urlencode & chars in outbound url's by BrowserUk (Patriarch) on Jun 12, 2002 at 11:15 UTC
Sorry! I forgot that I don't need to entity encrypt the & within the `<code>` sections. It is definately the & that is (was! Using uri_encode() fixed it) the problem. If I can work out how I edit the node, I will correct that!	[reply] [d/l]


Pathologically Eclectic Rubbish Lister
	PerlMonks