perlquestion
isync
Handling PUT or PROPFIND methods with CGI turned out - at least for me - to be quite a challenge while doing it under mod_cgi with common helpers like CGI.pm or CGI::Simple.
<br>
<br>
First, they do not handle less-common methods like PROPFIND well, throw away message body content by pretending to read it or deny answering them at all.
<br>Second, large message bodies are (commonly) slurped in! But PUT and POST requests can get large, even PROPFIND XML message bodies might get bloated - and all these requests might thus pose a DoS effectively.
<br>
<br>
Please find attached my take on fixing these issues. It's a WIP so please comment!
<br>
<br>Further, I'd like a knowledgeable monk to tell me if my attempts to keep STDIN content unprocessed so my script can later do a buffered read are ineffective. Does it really make any difference to keep arriving content on STDIN than copying it to a variable (I think this answers itself, didn't it?)
<br>Or is a closely knit-in Apache handler the only way to process message body data *as it arrives*? Or wouldn't even this do the trick?<br>In reality it seems to work as intended: with the code snippet at the bottom of this post I can see an uploading file growing in filesystem as data arrives.
<readmore>
<code>
package REST::CGI::Simple;
#
# patched version of CGI::Simple
# - PROPFIND method with message body possible, see keyword "patch1"
# - no PUT data value parsing for zeros (may also be set by NO_NULL = 0), but hardcoded, see keyword "patch2"
# - differentiate handling of medium and very large message bodies, see keyword "patch3"
...
sub _read_parse {
my $self = shift;
my $handle = shift || \*STDIN;
my $data = '';
my $type = $ENV{'CONTENT_TYPE'} || 'No CONTENT_TYPE received';
my $length = $ENV{'CONTENT_LENGTH'} || 0;
my $method = $ENV{'REQUEST_METHOD'} || 'No REQUEST_METHOD received';
# first check POST_MAX Steve Purkis pointed out the previous bug
if ( ( $method eq 'POST' or $method eq 'PUT' or $method eq 'PROPFIND' ) # patch1
and $self->{'.globals'}->{'POST_MAX'} != -1
and $length > $self->{'.globals'}->{'POST_MAX'} ) {
$self->cgi_error(
"413 Request entity too large: $length bytes on STDIN exceeds \$POST_MAX!"
);
# silently discard data ??? better to just close the socket ???
while ( $length > 0 ) {
last unless _internal_read( $self, $handle, my $buffer );
$length -= length( $buffer );
}
return;
}
if ( $length and $type =~ m|^multipart/form-data|i ) {
my $got_length = $self->_parse_multipart( $handle );
if ( $length != $got_length ) {
$self->cgi_error(
"500 Bad read on multipart/form-data! wanted $length, got $got_length"
);
}
return;
}
# patch3
elsif ( ( $method eq 'POST' or $method eq 'PUT' or $method eq 'PROPFIND' )
and $self->{'.globals'}->{'HANDLE_ONLY_ON_LARGE_CONTENT'} == 1
and $length > $self->{'.globals'}->{'HANDLE_ONLY_WHEN_LARGER_AS'} ) {
# reading data from STDIN here would mean taking it away from a
# script which might process it on its own, so we just signal that
# it is still there, untouched
$self->_add_param( 'HANDLE_ONLY', 1 );
$ENV{STDIN_WAITING} = 1;
}
elsif ( $method eq 'POST' or $method eq 'PUT' or $method eq 'PROPFIND' ) { # patch1
if ( $length ) {
# we may not get all the data we want with a single read on large
# POSTs as it may not be here yet! Credit Jason Luther for patch
# CGI.pm < 2.99 suffers from same bug
_internal_read( $self, $handle, $data, $length );
while ( length( $data ) < $length ) {
last unless _internal_read( $self, $handle, my $buffer );
$data .= $buffer;
}
unless ( $length == length $data ) {
$self->cgi_error( "500 Bad read on POST! wanted $length, got "
. length( $data ) );
return;
}
if ( $type !~ m|^application/x-www-form-urlencoded| ) {
$self->_add_param( $method . "DATA", $data );
}
else {
$self->_parse_params( $data );
}
}
}
elsif ( $method eq 'GET' or $method eq 'HEAD' ) {
$data
= $self->{'.mod_perl'}
? $self->_mod_perl_request()->args()
: $ENV{'QUERY_STRING'}
|| $ENV{'REDIRECT_QUERY_STRING'}
|| '';
$self->_parse_params( $data );
}
else {
unless ( $self->{'.globals'}->{'DEBUG'}
and $data = $self->read_from_cmdline() ) {
$self->cgi_error( "400 Unknown method $method" );
return;
}
unless ( $data ) {
# I liked this reporting but CGI.pm does not behave like this so
# out it goes......
# $self->cgi_error("400 No data received via method: $method, type: $type");
return;
}
$self->_parse_params( $data );
}
}
...
sub _add_param {
my ( $self, $param, $value, $overwrite ) = @_;
return () unless defined $param and defined $value;
$param =~ tr/\000//d if $self->{'.globals'}->{'NO_NULL'};
@{ $self->{$param} } = () if $overwrite;
@{ $self->{$param} } = () unless exists $self->{$param};
my @values = ref $value ? @{$value} : ( $value );
for my $value ( @values ) {
next
if $value eq ''
and $self->{'.globals'}->{'NO_UNDEF_PARAMS'};
# $value =~ tr/\000//d if $self->{'.globals'}->{'NO_NULL'}; # patch2: we could ask for NO_NULL, but this was such a headache
# that I hardcoded it as commented out
$value = Encode::decode( utf8 => $value )
if $self->{'.globals'}->{PARAM_UTF8};
push @{ $self->{$param} }, $value;
unless ( $self->{'.fieldnames'}->{$param} ) {
push @{ $self->{'.parameters'} }, $param;
$self->{'.fieldnames'}->{$param}++;
}
}
return scalar @values; # for compatibility with CGI.pm request.t
}
</code>
(Only excerpts, something prevented me from posting the whole..)
<br>
<br>
With this CGI::Simple variant in effect, I later in my code do:
<code>
...
my $fh = $fs->open_write( $path ); # Filesys::Virtual::Plain
if( $fh ){
if($ENV{STDIN_WAITING}){
File::Copy::copy(\*STDIN,$fh); # rely on File::Copy's robust buffered GLOB to HANDLE copy
}else{
print $fh $self->query->param('PUTDATA');
}
$fs->close_write($fh);
return 1;
}else{
$fs->close_write($fh);
return 0;
}
</code>
</readmore>