Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?
 
PerlMonks  

How do I read POST data that is not encoded, and was submitted without a parameter name

by c-era (Curate)
on Dec 14, 2001 at 19:45 UTC ( [id://131978]=perlquestion: print w/replies, xml ) Need Help??

c-era has asked for the wisdom of the Perl Monks concerning the following question:

Ok, lets try this question again. I have data that is being sent to me via http POST. The data will contain &;, and it will not be escaped. There will also be no parameter name passed. I've looked it up, and because the content-type is text/xml, this IS exactly how it is suppost to be sent. They aren't sending it as a file attachment either.

Now everyone repeate after me <chant>I can't change the input, I can't change the input, I can't change the input</chant>.

My questions now:
Can it be done with CGI.pm?
If it can, how?
If it can't, should it be something that is added to CGI.pm, or should another module be created to handle this?

I hope my intent in this post is clearer then it was in the last post.

P.S. Thanks perrin & CMonster.

Replies are listed 'Best First'.
Re: How do I read POST data that is not encoded, and was submitted without a parameter name
by robin (Chaplain) on Dec 14, 2001 at 23:35 UTC
    This question seems to have generated a lot of confusion, so I'm going to go into more detail than might seem necessary.

    In a way, the problem is historical. When Tim Berners-Lee first introduced the WWW, he included a facility for making queries of a remote index. (Remember the <isindex> tag?) In fact, you can see from the first announcement that the index facility was there right from the start.

    The way it worked was that if you went to a page with an <isindex> tag, a text box would appear into which you could type a list of keywords. When you type "something" in and submit it, the browser requests the same URL with ?something appended to it. If there are several keywords, they are separated by + signs.

    As the web grew, features were added to the software and the protocols. By 1992, the protocol which would become HTTP/1.0 existed in outline. A POST method was introduced, and various uses were envisaged for it. Forms were proposed in 1993 in Dave Raggett's HTML+ proposal, with the idea of using foo=bar&baz=quux in the query string as a way of encoding the form data. He also mentions the possibility of encoding the form data in other ways, and sending it in the body of the request.

    Fill-out forms were introduced in Mosaic 2.0 which supported POST requests, but only with the application/x-www-form-urlencoded encoding.

    Early web servers provided custom mechanisms for supporting searchable indexes and (later) fill-out forms. At the end of 1993, the Common Gateway Interface (CGI) was introduced, and implemented in NCSA httpd 1.0. The CGI hasn't changed very much since then either. The basic idea of the gateway interface is that an interface between the web and some other system can be created by writing an ordinary Unix program, which is associated with a URL. Information about the request is passed in environment variables, and the body of the request can be read from standard input. The result document is simply printed to standard output, preceded by a handful of headers.

    So by the end of 1993 we had NCSA Mosaic with its support for fill-out forms, and we had the NCSA httpd with its support for the CGI. It was a powerful combination, and in my opinion it was that combination which propelled Perl to stardom. Perl was already installed on a lot of Unix systems because it was useful for automating system administration tasks, and it very quickly became the language of choice for writing CGI programs, because of its powerful text processing capabilities.

    But of course Mosaic (and later Netscape 1.0) only supported the application/x-www-form-urlencoded encoding for POST requests, so that was the important thing to deal with. Later on (in Netscape 1.1) a file upload feature was added, which uses a different MIME-like encoding called multipart/form-data. To this day, those two encodings are the only ones mentioned in the HTML specification in the context of form submission.

    Now that XML is becoming firmly established as the data transfer format of choice, people are starting to transfer XML directly over HTTP. This document discusses different ways of achieving that, and recommends that the XML data be sent directly in the request body using the application/xml content type. SOAP can also be sent in HTTP, and the spec says that requests should be sent as HTTP requests using the text/xml media type.

    So, sending XML data in an HTTP POST request is perfectly legal HTTP, and it is supported by the Common Gateway Interface. It's also the recommended way of transferring XML data over HTTP. The only problem is that it's not directly supported by CGI client libraries such as CGI.pm. That doesn't mean that you necessarily shouldn't use CGI.pm in such an application - the methods that it provides for generating responses might be useful, for example - but you can't expect it to decode the data for you. On the other hand, there are plenty of good modules for dealing with XML data, and this is just XML data after all. Check out the Perl-xml mailing list and the many XML modules on CPAN.

    I hope this makes the situation slightly clearer.

Re: How do I read POST data that is not encoded, and was submitted without a parameter name
by sevensven (Pilgrim) on Dec 15, 2001 at 00:20 UTC

    Great post Robin, thanks for the memory lane ;^)

    Now, in my previous post to c-era previous question, I said that you could use CGI.pm (using the sample upload script that you where providing) and that was true, but only if the post data came encoded, wich it did since LWP::UserAgent plays nice with CGI and encodes the data.

    CGI.pm cannot be used directly to solve your problem because it uses &; as CGI parameters separators and it will get confused by your XML sample file (as you can see if you send the CGI instance into Dumper)

    The code in CGI.pm thats prevents its use is :

    sub parse_params { my($self,$tosplit) = @_; my(@pairs) = split(/[&;]/,$tosplit); my($param,$value); foreach (@pairs) { ($param,$value) = split('=',$_,2); $value = '' unless defined $value; $param = unescape($param); $value = unescape($value); $self->add_parameter($param); push (@{$self->{$param}},$value); } }

    Now, I do agree that CGI.pm gives you a nice framework with things like carping to browser, etc., and if I said previously that CGI.pm could be used, I'll make him behave :^)

    You can create a new module, let us call it xmlCGI.pm, wich inherits from CGI.pm and redefines parse_params

    xmlCGI.pm could be something like this :

    package xmlCGI; use CGI; @ISA = ("CGI"); sub parse_params { my($self,$tosplit) = @_; #*** place all the input into a param named xml push (@{$self->{'xml'}}, $tosplit); #*** notice tosplit turned to empty #*** defusing the rest of this code $tosplit = ''; my(@pairs) = split(/[&;]/,$tosplit); my($param,$value); foreach (@pairs) { ($param,$value) = split('=',$_,2); $value = '' unless defined $value; $param = unescape($param); $value = unescape($value); $self->add_parameter($param); push (@{$self->{$param}},$value); } } 1;

    Every cgi that uses xmlCGI.pm will receive a parameter named xml with all the post data.

    Now, as with my previous post, I've got working code to back my claims ;^)

    -- sevensven or nana korobi, ya oki
Re: How do I read POST data that is not encoded, and was submitted without a parameter name
by isotope (Deacon) on Dec 14, 2001 at 21:38 UTC
    Start with this:
    open(OUTPUT, "> test.txt") or die "test.txt: $!"; for(<STDIN>) { print OUTPUT $_; } close(OUTPUT);
    And see what's coming in on stdin. The problem should be much clearer, and CGI.pm is not the answer, since your input is not conformant with CGI standards.

    Update: Ok, after re-reading your first post, I see that you've already tried this. Now I'll attempt to answer your questions directly. You have the right idea with that sample code, and no, CGI.pm won't help you one bit. You're not dealing with true CGI data. CGI.pm shouldn't be updated to deal with this, because your application isn't compliant with the CGI specifications, which, circularly, is why CGI.pm won't help you out.

    --isotope
    http://www.skylab.org/~isotope/
Re: How do I read POST data that is not encoded, and was submitted without a parameter name
by AidanLee (Chaplain) on Dec 14, 2001 at 20:08 UTC
    The CGI specification (not CGI.pm, mind you. CGI.) states that input to a CGI script comes in on STDIN, so you ought to be able to grab the data from there.

    Try this for more information.

Reading RAW POST data
by johanvdb (Beadle) on Dec 15, 2001 at 05:01 UTC
    A simple way to read raw POST data is the following code excerpt
    sub _read_content { my ($self) = @_; my $length = $ENV{CONTENT_LENGTH}; my $buf; read STDIN, $buf, $length; return $buf; }
    I use it to read in a XMLRPC request which I process with Frontier::RPC2. But you can use it to read in any XML POST data ( no multipart forms ... ) and do some processing on it.
    Johan
      AFAIK read only tries to read specified amount of data from filehandle and actually can return less data. Correct code should use loop like:
      sub _read_content { my $length = $ENV{CONTENT_LENGTH}; my $rest = $length; my $buf; while($rest < $length) { my $read = read STDIN, $buf, $length - $rest, $rest; die "Can't read from a stream: $!" unless defined $read; return $buf if $read == 0; $rest += $read; } return $buf; }

      --
      Ilya Martynov (http://martynov.org/)

        In the general case, this code would be very fragile. Servers can lie about content length; things can go wrong. You should attempt to try to read a certain number of times, possibly giving up after a series of consective reads that draw zero bytes, and/or return all that you have after a given amount of time.

        update to Ilya's response: I should clarify my statement. Several years ago I had that sort of code running in a script, and I came to grief over the problem of content length. It didn't always correspond to what I received. I no longer have access to the code, so I can't go and look it up, but in a nutshell I ignored the content-length value, and just tried to read as much as I could in a certain time frame (45 seconds IIRC).

        That said, I'm willing to believe that servers these days are much more reliable, and produce accurate values for content length... although I think I'll always mistrust them.

        --
        g r i n d e r
        just another bofh

        print@_{sort keys %_},$/if%_=split//,'= & *a?b:e\f/h^h!j+n,o@o;r$s-t%t#u';

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://131978]
Approved by root
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others sharing their wisdom with the Monastery: (9)
As of 2024-03-28 12:24 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found