Ah I see, the bottleneck is in Net::S3::Amazon, which appears to be using an XPath approach to get at the needed information. Looks like list_all calls list_bucket_all, which calls list_bucket, which does the XPath dirty work.
If I were in your shoes, I might try to write an alternative sub to list_bucket which uses an approach other than XPath. If you look in:
http://search.cpan.org/src/PAJAS/XML-LibXML-1.65/lib/XML/LibXML/XPathC
+ontext.pm
sub find calls new for each node it needs to find:
sub find {
my ($self, $xpath, $node) = @_;
my ($type, @params) = $self->_guarded_find_call('_find', $xpath, $
+node);
if ($type) {
return $type->new(@params);
}
return undef;
}
This is where the OO interface of XML::LibXML::XPathContext is your bottleneck. You could probably develop a faster interface using a streaming parser, but how much faster I don't know. You'll need some sort of optimization in there to get a faster result. Sorry I can't be of much more help.
UPDATE - you could also use some of the modules lower level functions, and attempt to parallelize the operation by having each cpu count x percent of the buckets. That's probably easier than speeding up the xml parser, and I think it is probably your best bet to get a two or more times speedup. You could fork off a process that writes the results to a temp file, and then add up all the results at the end. I think that may be your shortest course to victory. |