RFC: CPAN.pm and CPAN::SQLite

Making its way around CPAN is a development version of a package, CPAN-SQLite, which is a set of modules that can be used to set up, maintain, and search through a SQLite database of the information contained in the CPAN indices on authors, modules, and distributions. Andreas has added experimental support for this in the latest development version (1.88_65) of CPAN.pm, and I'm especially interested in hearing of experiences with this from others.

CPAN.pm gets its information on CPAN authors, modules, and distributions from the CPAN indices, and currently loads all of this into memory. With more and more packages being added to CPAN, this memory footprint can be large. What CPAN::SQLite does is enable CPAN.pm to get the information it needs for a given client request through a query to a SQLite database. This particular information is then loaded into memory, so as to be easily accessible within the same session; what this means is that only the information that a user has requested previously is put into memory. This can represent a significant saving in memory usage - I've seen reductions from 60 MB to 20 MB on some systems I've tried this on after a few random queries. However, there are queries for which essentially all available information is needed to be loaded into memory, the cpan> r call within the CPAN.pm shell to get a list of all recommended updates being one example. If such a query is made, the memory footprint with and without CPAN-SQLite is comparable.

I'd be interested in hearing, first of all, if there's any problems with building and testing the package, and secondly, if you use it with CPAN.pm, if there's any problems with various types of queries. To enable CPAN-SQLite, within the CPAN.pm shell, one can do

  cpan> o conf use_sqlite 1
[download]

and then, if you like,

  cpan> o conf commit
[download]

to keep this setting for future use. The first time this is used the database should be created under the cpan_home entry of CPAN::Config (the same location where Metadata is found), whereas subsequent invocations should just update the database.

I'd be also interested in hearing any ideas for possible extensions for this, both in extending the search capabilites of CPAN.pm and also for uses outside of CPAN.pm. Thanks!

Comment on RFC: CPAN.pm and CPAN::SQLite Select or Download Code

Back to Meditations