[Bioperl-l] Another Taxonomy modules to CPAN

Wed Nov 3 09:42:49 UTC 2010

Hi all,

I have written a couple of modules that overlap certain functionality 
with Bio::DB::Taxonomy and Bio::Taxon. I had to write them because 
certain constraints in the environment I had to run it (GRID) made 
impossible to use a bioperl based solution.

The main features of these modules are:

+ No dependencies of non-standard Perl modules
+ NCBI and RDP based taxonomies supported
+ Very fast and low memory footprint -- orders of magnitude faster than 
Bioperl modules (for the tasks they are designed for --).

Of course, they do not compete with Bio::DB::Taxonomy and Bio::Taxon in 
completeness or integration with other tools (e.g. rest of bioperl suit) 
but they are handy for mapping very large datasets (for example blast 
results) with the NCBI or RDP Taxonomy.

The modules are:

Taxonomy::Base -- Finds ancestors, ranks, converts between
                   names, ranks and IDs, etc...

Taxonomy::RDP  -- Reads the taxonomic tree from the RDP xml file

Taxonomy::NCBI -- Reads the taxonomic tree from flat NCBI files
                   (nodes.dmp and names.dmp)
                   (Similar to Bio::DB::Taxonomy::flatfile)

Taxonomy::NCBI::Gi2taxid -- Converts very fast and efficiently
                             NCBI GIs to Taxids.
                             Uses a binary lookup table.

These modules are being used by several groups now -- mainly working 
with large metagenomics datasets -- and I am considering uploading them 
to CPAN, but I am not clear on where these modules should be placed there.

How do you think I should name these modules? (e.g. where these modules 
should live in CPAN?) Their natural place could be under 
Bio::DB::Taxonomy, maybe Bio::DB::Taxonomy::Lite / 
Bio::DB::Taxonomy::Lite::NCBI / etc...? Is this possible (and 
convenient) without being part of Bioperl? Any other suggestions?

Thank you very much in advance,

M;

----------------------------------------------------