BioPerl

Item Description: A collection of bioinformatics modules

Review Synopsis:

This is a review of the BioPerl modules. These modules ~~are not~~ used to not be available through CPAN, ~~so they must be obtained from the BioPerl website.~~, so you used to have to get them from the BioPerl website. That is no longer true, you can now use the standard CPAN shell to install BioPerl. This is a large set of modules covering several bioinformatics tasks. This will be a fairly high level review, as there are 174 modules that make up this set (the full install is 5.4 M). The most recent release as of this writing is 0.7.1, and there is a developers release (that I have not looked at) 0.9.

The prerequisites are nothing out of the ordinary: LWP, IO::String, XML::Node, and XML::Writer, though BioPerl does provide interfaces for several programs and databases, so to work with those, you will obviously need to have them too. Bundle::BioPerl will install all of the prerequisites for you, though I installed doing the make tango and the installation was flawless; a few tests failed out of over 1000, but that wasn't a big deal.

There are several module groups:

Bio::AlignIO::*: wrappers for several alignment programs like clustalw and pfam.
Bio::Annotation::*: Objects for holding annotations (simple comments, links to other databases, or literature references).
Bio::DB::*: Interfaces to several databases, including GenBank, GenPept, SwissProt and several others.
Bio::Factory::*: This is a set of objects for instanciating Bio::SeqAnalysisParserI, which is a generic interface for sequence analsys parsers. The idea is to give a generic interface for parse so that annotation pipelines can be built, and when a new parser or program comes along, a complete rewrite is not necessary.
Bio::Index::*: Methods for indexing several types of databases.
Bio::LiveSeq::*: This is a very feature rich DNA sequence object. Several types of annotations can be added here. It seems that there is a fair bit of overlap between these modules and those in Bio::SeqFeature; it is not clear to me when, if ever, you would want to use one over the other. It may just be a matter of preference.
Bio::Location::*: Contains methods for handling location coordinants on sequences. As the documentation says, this may seem easy, but it deals with fuzzy or compound (split) locations, as well as handling rules for locations such as like 'always widest range' or 'always smallest range'.
Bio::Root::*: Several utility modules that are inherited from in other modules.
Bio::Seq::*: Contains extensions for the main object for sequences, Bio::Seq, including LongSeq for long (genomic) sequences and RichSeq for annotated sequences. Bio::Seq is the workhorse object, which holds nucleotide or proteins sequences, as well as annotations. It provides several handy sequence manipulation methods such as revcom (reverse complement) and translate.
Bio::SeqFeature::*: Objects containing feature annotations of sequences; allows fairly complex relationships to be expressed between related sequences, as well as detail about individual sequences, like the locations of exons and transcripts. The list of possible options is somewhat limited, so more specific features should probably be created by subclassing the generic class.
Bio::SeqIO::*: Handles I/O streams for several sequence database types (like GenBank annotations/features, GCG and SwissProt).
Bio::Tools::*: Several items here, including result holders and parsers for several programs. The BLAST parser is worth its weight in gold.
Bio::Variation::*: These appear to be modules for working with SNPs and other mutations.

In all honesty, I have used only a few of these modules. The majority of them are very specialized, so a "general practitioner" like me is unlikely to need them often. There are so many modules here that it is difficult to know if a problem you have might be addressed by BioPerl, which is why I undertook writing this review. I hope it has been helpful to you, and if you have any experience with BioPerl, please add your comments.

Special thanks to the other members of my group, and especially Ben Faga (not a monk, but still a good Perl programmer), for their input and insight while writing this review, as well as Arguile for pointing out that BioPerl is now available at CPAN.

New note 2002-05-20: I plan on bringing this up to date for BioPerl v1.0 as soon as possible.

Comment on BioPerl

Replies are listed 'Best First'.
Re: BioPerl by stajich (Chaplain) on May 21, 2002 at 20:08 UTC
Some noteworthy additions to 1.0 Bio::SearchIO which is a single parsing system for FASTA, BLAST text (WU and NCBI BLAST), and BLAST XML all using the same API. This replaces Bio::Tools::Blast as most of te BLAST dev effort has been moved to the new objects. SearchIO also supports pluggable writers so that reports can be rendered as HTML, Text Tables, or (eventually) the pseudo-standard XML. Bio::Graphics rendering sequences and features Bio::DB::GFF which is Lincoln's fast GFF database using mysql. It can be the back end for a GMOD instance. Bio::Map and Bio::MapIO for reading in map data. Bio::Tree for parsing and manipulating phylogenetic trees. Bio::Biblio for bibliographic objects and access to the EBI OpenBQS server. See the Change log for more details and other noteworthy improvements. There is also some rough class diagrams and other brief docs here 1.01 which contains bug fixes to the SearchIO parsers and other assundry fixes will be done by early June. An aside - and a place that a monk could perhaps help - our module list is so large that the generated makefile will not run on IRIX and other OSes with a smallish shell buffers. Using MakeMaker from 5.7.x fixes it partially, but the arg list is still not broken up enough so that "make test" can't run. We need to override the necessary methods in our Makefile.PL so that installs work on small buffer OSes. Scott - would still like to hear good/bad design comments and observations about 1.0 and (lack of?) functionality people need.	[reply]
Re: BioPerl by tucano (Scribe) on Jun 10, 2004 at 14:38 UTC
About Installation. Now BioPerl is avaible at CPAN Bundle::BioPerl contain severeal modules for dependencies now make a search with i/Bioperl/ or i/BYRNEY/ for find the module just follow the instructions	[reply]


Clear questions and runnable code get the best and fastest answer
	PerlMonks