update: As of 2007/11/30, CGI v 3.31 has the patch to accept PUT data correctly. Thanks to prodding from rhesa, CGI and CGI::Simple now support all the HTTP methods necessary to build REST services.
I finally started building a true REST webservice when I ran smack into a wall. The service is your basic crud service where the HTTP GET will retrieve products, DELETE will delete, POST will update and PUT will create:
GET http://foo.com/webservice/<productid>
DELETE http://foo.com/webservice/<productid>
POST http://foo.com/webservice/<productid>
PUT http://foo.com/webservice
Nothing out of the ordinary there right? The thing is, I'm a big fan of
CGI::Application and it uses
CGI at its core (but that's overridable). The wall is the way
CGI handles the PUT method (it doesn't really) and the way it handles POST methods -- it's designed for html form parsing. No problem, I thought.
CGI::Application has the capability to switch out
CGI with any other module as long as that module adheres to the
CGI interface (well, not the entire interface).
So I needed a module that would
- adhere to the CGI interface
- support the HTTP PUT method
- not form parse PUT and POST data
after much searching, I couldn't find a module for those needs. The closest I came was
CGI::XMLpost but that wasn't even horseshoe close.
I finally decided to build one myself but given the nature of CGI, I was pretty sure it wasn't going to be quick and it wasn't going to be pretty. I had used CGI::Simple in the past and started thinking if there was a way to co-opt it into what I wanted.
After looking at the code, I figured out all I need to do was override its' _read_parse method and then add accessors for the POST and PUT data. There were only 4 changes needed for _read_parse
- ensure the POST_MAX check is also done when the method is PUT
- also read from STDIN when the method is PUT
- don't send PUT and POST data to _parse_params
- add accessors for the PUT and POST data
package Foo::CGI::Rest;
use base 'CGI::Simple';
sub _read_parse {
my $self = shift;
my $data = '';
my $type = $ENV{'CONTENT_TYPE'} || 'No CONTENT_TYPE received';
my $length = $ENV{'CONTENT_LENGTH'} || 0;
my $method = $ENV{'REQUEST_METHOD'} || 'No REQUEST_METHOD received';
# change #1 - added or "PUT" here ... we don't want
# malicious PUTs either
# first check POST_MAX Steve Purkis pointed out the previous bug
if( ( $method eq 'POST' or $method eq "PUT" )
and $self->{'.globals'}->{'POST_MAX'} != -1
and $length > $self->{'.globals'}->{'POST_MAX'}) {
$self->cgi_error(
"413 Request entity too large: $length bytes on STDIN exceeds
+\$POST_MAX!"
);
# silently discard data ??? better to just close the socket ???
while ($length > 0) {
last unless sysread(STDIN, my $buffer, 4096);
$length -= length($buffer);
}
return;
}
if( $length and $type =~ m|^multipart/form-data|i ) {
my $got_length = $self->_parse_multipart;
if( $length != $got_length ) {
$self->cgi_error("500 Bad read on multipart/form-data! wanted $l
+ength, got $got_length");
}
# changed #2 - or "PUT" here too
} elsif( $method eq 'POST' or $method eq 'PUT' ) {
if( $length ) {
# we may not get all the data we want with a single read on larg
+e
# POSTs as it may not be here yet! Credit Jason Luther for patch
# CGI.pm < 2.99 suffers from same bug
sysread(STDIN, $data, $length);
while( length($data) < $length ) {
last unless sysread(STDIN, my $buffer, 4096);
$data .= $buffer;
}
# change 3 - don't send data to parse params ... it's not form d
+ata
if( $length == length $data ) {
$self->set_data( $data );
} else {
$self->cgi_error("500 Bad read on POST! wanted $length, got "
+. length($data));
}
}
} elsif( $method eq 'GET' or $method eq 'HEAD' ) {
$data =
$self->{'.mod_perl'}
? $self->_mod_perl_request()->args()
: $ENV{'QUERY_STRING'}
|| $ENV{'REDIRECT_QUERY_STRING'}
|| '';
$self->_parse_params($data);
} else {
unless ($self->{'.globals'}->{'DEBUG'}
and $data = $self->read_from_cmdline()) {
$self->cgi_error("400 Unknown method $method");
}
}
}
# change 4 - create accessors
sub set_data {
my( $self, $data ) = @_;
$self->{_data} = $data;
}
sub get_data {
my( $self ) = @_;
return $self->{_data};
}
1;
Now in my
CGI::Application all I have to do is
sub cgiapp_get_query {
my $self = shift;
require Foo::CGI::Rest;
return Foo::CGI::Rest->new();
}
and in my handlers for POST and PUT:
sub update {
my $self = shift;
my $cgi = $self->query();
my $xmlstr = $cgi->get_data();
...
}
sub create {
my $self = shift;
my $cgi = $self->query();
my $xmlstr = $cgi->get_data();
...
}
The thing that worries me though, is REST has been around a few years now and CGI has been around forever. So any ideas why CGI and it's derivatives treat PUT like a read headed step child?
Update: Updated title.
Update: Given the great feedback from rhesa, I've further simplified the _read_parse override by setting the POSTDATA and PUTDATA params if the POST'ed and PUT'ed data is not of type 'application/x-www-form-urlencoded' ... hmmm maybe I should submit this as a patch to the CGI::Simple author.
Here it is in all it's simpleness:
package Foo::CGI::Rest;
use base 'CGI::Simple';
sub _read_parse {
my $self = shift;
my $data = '';
my $type = $ENV{'CONTENT_TYPE'} || 'No CONTENT_TYPE received';
my $length = $ENV{'CONTENT_LENGTH'} || 0;
my $method = $ENV{'REQUEST_METHOD'} || 'No REQUEST_METHOD received';
# first check POST_MAX Steve Purkis pointed out the previous bug
if( ( $method eq 'POST' or $method eq "PUT" )
and $self->{'.globals'}->{'POST_MAX'} != -1
and $length > $self->{'.globals'}->{'POST_MAX'}) {
$self->cgi_error(
"413 Request entity too large: $length bytes on STDIN exceeds
+\$POST_MAX!"
);
# silently discard data ??? better to just close the socket ???
while ($length > 0) {
last unless sysread(STDIN, my $buffer, 4096);
$length -= length($buffer);
}
return;
}
if( $length and $type =~ m|^multipart/form-data|i ) {
my $got_length = $self->_parse_multipart;
if( $length != $got_length ) {
$self->cgi_error("500 Bad read on multipart/form-data! wanted $l
+ength, got $got_length");
}
} elsif( $method eq 'POST' or $method eq 'PUT' ) {
if( $length ) {
# we may not get all the data we want with a single read on larg
+e
# POSTs as it may not be here yet! Credit Jason Luther for patch
# CGI.pm < 2.99 suffers from same bug
sysread(STDIN, $data, $length);
while( length($data) < $length ) {
last unless sysread(STDIN, my $buffer, 4096);
$data .= $buffer;
}
if( $length == length $data ) {
if( $type !~ m|^application/x-www-form-urlencoded| ) {
$self->_add_param( $method . "DATA", $data );
} else {
$self->_parse_params( $data );
}
} else {
$self->cgi_error("500 Bad read on POST! wanted $length, got "
+. length($data));
}
}
} elsif( $method eq 'GET' or $method eq 'HEAD' ) {
$data =
$self->{'.mod_perl'}
? $self->_mod_perl_request()->args()
: $ENV{'QUERY_STRING'}
|| $ENV{'REDIRECT_QUERY_STRING'}
|| '';
$self->_parse_params($data);
} else {
unless ($self->{'.globals'}->{'DEBUG'}
and $data = $self->read_from_cmdline()) {
$self->cgi_error("400 Unknown method $method");
}
}
}
1;
Update: Submitted a patch to the author of CGI::Simple