c-era has asked for the wisdom of the Perl Monks concerning the following question:
Ok, lets try this question again. I have data that is being sent to me via http POST. The data will contain &;, and it will not be escaped. There will also be no parameter name passed. I've looked it up, and because the content-type is text/xml, this IS exactly how it is suppost to be sent. They aren't sending it as a file attachment either.
Now everyone repeate after me <chant>I can't change the input, I can't change the input, I can't change the input</chant>.
My questions now: Can it be done with CGI.pm?
If it can, how?
If it can't, should it be something that is added to CGI.pm, or should another module be created to handle this?
I hope my intent in this post is clearer then it was in the last post.
P.S. Thanks perrin & CMonster.
Re: How do I read POST data that is not encoded, and was submitted without a parameter name
by robin (Chaplain) on Dec 14, 2001 at 23:35 UTC
|
This question seems to have generated a lot of confusion,
so I'm going to go into more detail than might seem necessary.
In a way, the problem is historical. When Tim Berners-Lee first
introduced the WWW, he included a facility for making queries of
a remote index. (Remember the <isindex> tag?) In fact, you
can see from the first
announcement that the index facility was there right from the start.
The way it worked was that if you went to a page with an <isindex> tag,
a text box would appear into which you could type a list of keywords. When you type
"something" in and submit it, the browser requests the same URL with ?something
appended to it. If there are several keywords, they are separated by + signs.
As the web grew, features were added to the software and the protocols. By 1992,
the protocol which would become HTTP/1.0 existed in outline. A
POST method was introduced,
and various uses were envisaged for it. Forms were proposed in 1993 in
Dave Raggett's HTML+
proposal, with the idea of using foo=bar&baz=quux in the query
string as a way of encoding the form data. He also mentions the possibility
of encoding the form data in other ways, and sending it in the body of the
request.
Fill-out forms were introduced in
Mosaic 2.0
which supported POST requests, but only with the application/x-www-form-urlencoded
encoding.
Early web servers provided custom mechanisms for supporting searchable indexes and (later)
fill-out forms. At the end of 1993, the Common Gateway Interface (CGI) was
introduced, and implemented in NCSA httpd 1.0. The CGI hasn't changed very much since then
either. The basic idea of the gateway interface is that an interface between the web
and some other system can be created by writing an ordinary Unix program, which is
associated with a URL. Information about the request is passed in environment variables,
and the body of the request can be read from standard input. The result document is
simply printed to standard output, preceded by a handful of headers.
So by the end of 1993 we had NCSA Mosaic with its support for fill-out forms, and
we had the NCSA httpd with its support for the CGI. It was a powerful combination,
and in my opinion it was that combination which propelled Perl to stardom. Perl
was already installed on a lot of Unix systems because it was useful for automating
system administration tasks, and it very quickly became the language of choice for
writing CGI programs, because of its powerful text processing capabilities.
But of course Mosaic (and later Netscape 1.0) only supported
the application/x-www-form-urlencoded encoding for POST requests, so that
was the important thing to deal with. Later on (in Netscape 1.1) a file upload feature
was added, which uses a different MIME-like encoding called multipart/form-data.
To this day, those two encodings are the only ones mentioned in the
HTML specification
in the context of form submission.
Now that XML is becoming firmly established as the data transfer format of choice,
people are starting to transfer XML directly over HTTP. This document
discusses different ways of achieving that, and recommends that the XML data be sent directly
in the request body using the application/xml content type. SOAP can also be sent
in HTTP, and the spec says
that requests should be sent as HTTP requests using the text/xml media type.
So, sending XML data in an HTTP POST request is perfectly legal HTTP,
and it is supported by the Common Gateway Interface. It's also the
recommended way of transferring XML data over HTTP. The only problem is that it's not
directly supported by CGI client libraries such as CGI.pm. That doesn't mean that you
necessarily shouldn't use CGI.pm in such an application - the methods that it provides
for generating responses might be useful, for example - but you can't expect it to
decode the data for you. On the other hand, there are plenty of good modules for dealing
with XML data, and this is just XML data after all. Check out the
Perl-xml mailing list
and the many XML modules on CPAN.
I hope this makes the situation slightly clearer.
| [reply] [Watch: Dir/Any] |
Re: How do I read POST data that is not encoded, and was submitted without a parameter name
by sevensven (Pilgrim) on Dec 15, 2001 at 00:20 UTC
|
Great post Robin, thanks for the memory lane ;^)
Now, in my previous post to c-era previous question, I said that you could use CGI.pm (using the sample upload script that you where providing) and that was true, but only if the post data came encoded, wich it did since LWP::UserAgent plays nice with CGI and encodes the data.
CGI.pm cannot be used directly to solve your problem because it uses &; as CGI parameters separators and it will get confused by your XML sample file (as you can see if you send the CGI instance into Dumper)
The code in CGI.pm thats prevents its use is :
sub parse_params {
my($self,$tosplit) = @_;
my(@pairs) = split(/[&;]/,$tosplit);
my($param,$value);
foreach (@pairs) {
($param,$value) = split('=',$_,2);
$value = '' unless defined $value;
$param = unescape($param);
$value = unescape($value);
$self->add_parameter($param);
push (@{$self->{$param}},$value);
}
}
Now, I do agree that CGI.pm gives you a nice framework with things like carping to browser, etc., and if I said previously that CGI.pm could be used, I'll make him behave :^)
You can create a new module, let us call it xmlCGI.pm, wich inherits from CGI.pm and redefines parse_params
xmlCGI.pm could be something like this :
package xmlCGI;
use CGI;
@ISA = ("CGI");
sub parse_params {
my($self,$tosplit) = @_;
#*** place all the input into a param named xml
push (@{$self->{'xml'}}, $tosplit);
#*** notice tosplit turned to empty
#*** defusing the rest of this code
$tosplit = '';
my(@pairs) = split(/[&;]/,$tosplit);
my($param,$value);
foreach (@pairs) {
($param,$value) = split('=',$_,2);
$value = '' unless defined $value;
$param = unescape($param);
$value = unescape($value);
$self->add_parameter($param);
push (@{$self->{$param}},$value);
}
}
1;
Every cgi that uses xmlCGI.pm will receive a parameter named xml with all the post data.
Now, as with my previous post, I've got working code to back my claims ;^)
-- sevensven or nana korobi, ya oki | [reply] [Watch: Dir/Any] [d/l] [select] |
Re: How do I read POST data that is not encoded, and was submitted without a parameter name
by isotope (Deacon) on Dec 14, 2001 at 21:38 UTC
|
open(OUTPUT, "> test.txt")
or die "test.txt: $!";
for(<STDIN>) {
print OUTPUT $_;
}
close(OUTPUT);
And see what's coming in on stdin. The problem should be much clearer, and CGI.pm is not the answer, since your input is not conformant with CGI standards.
Update: Ok, after re-reading your first post, I see that you've already tried this. Now I'll attempt to answer your questions directly. You have the right idea with that sample code, and no, CGI.pm won't help you one bit. You're not dealing with true CGI data. CGI.pm shouldn't be updated to deal with this, because your application isn't compliant with the CGI specifications, which, circularly, is why CGI.pm won't help you out.
--isotope
http://www.skylab.org/~isotope/
| [reply] [Watch: Dir/Any] [d/l] |
Re: How do I read POST data that is not encoded, and was submitted without a parameter name
by AidanLee (Chaplain) on Dec 14, 2001 at 20:08 UTC
|
| [reply] [Watch: Dir/Any] |
Reading RAW POST data
by johanvdb (Beadle) on Dec 15, 2001 at 05:01 UTC
|
A simple way to read raw POST data is the following code excerpt
sub _read_content {
my ($self) = @_;
my $length = $ENV{CONTENT_LENGTH};
my $buf;
read STDIN, $buf, $length;
return $buf;
}
I use it to read in a XMLRPC request which I process with
Frontier::RPC2. But you can use it to read in any XML POST data
( no multipart forms ... ) and do some processing on it.
Johan | [reply] [Watch: Dir/Any] [d/l] |
|
AFAIK read only tries to read specified amount of data from filehandle and actually can return less data. Correct code should use loop like:
sub _read_content {
my $length = $ENV{CONTENT_LENGTH};
my $rest = $length;
my $buf;
while($rest < $length) {
my $read = read STDIN, $buf, $length - $rest, $rest;
die "Can't read from a stream: $!"
unless defined $read;
return $buf if $read == 0;
$rest += $read;
}
return $buf;
}
--
Ilya Martynov
(http://martynov.org/)
| [reply] [Watch: Dir/Any] [d/l] |
|
In the general case, this code would be very fragile. Servers can lie about content length; things can go wrong. You should attempt to try to read a certain number of times, possibly giving up after a series of consective reads that draw zero bytes, and/or return all that you have after a given amount of time.
update to Ilya's response: I should clarify my statement. Several years ago I had that sort of code running in a script, and I came to grief over the problem of content length. It didn't always correspond to what I received. I no longer have access to the code, so I can't go and look it up, but in a nutshell I ignored the content-length value, and just tried to read as much as I could in a certain time frame (45 seconds IIRC).
That said, I'm willing to believe that servers these days are much more reliable, and produce accurate values for content length... although I think I'll always mistrust them.
--g r i n d e r
just another bofh
print@_{sort keys %_},$/if%_=split//,'= & *a?b:e\f/h^h!j+n,o@o;r$s-t%t#u';
| [reply] [Watch: Dir/Any] |
|
|
|
|