[Bioperl-l] Another Taxonomy modules to CPAN
miguel.pignatelli at uv.es
Wed Nov 3 09:42:49 UTC 2010
I have written a couple of modules that overlap certain functionality
with Bio::DB::Taxonomy and Bio::Taxon. I had to write them because
certain constraints in the environment I had to run it (GRID) made
impossible to use a bioperl based solution.
The main features of these modules are:
+ No dependencies of non-standard Perl modules
+ NCBI and RDP based taxonomies supported
+ Very fast and low memory footprint -- orders of magnitude faster than
Bioperl modules (for the tasks they are designed for --).
Of course, they do not compete with Bio::DB::Taxonomy and Bio::Taxon in
completeness or integration with other tools (e.g. rest of bioperl suit)
but they are handy for mapping very large datasets (for example blast
results) with the NCBI or RDP Taxonomy.
The modules are:
Taxonomy::Base -- Finds ancestors, ranks, converts between
names, ranks and IDs, etc...
Taxonomy::RDP -- Reads the taxonomic tree from the RDP xml file
Taxonomy::NCBI -- Reads the taxonomic tree from flat NCBI files
(nodes.dmp and names.dmp)
(Similar to Bio::DB::Taxonomy::flatfile)
Taxonomy::NCBI::Gi2taxid -- Converts very fast and efficiently
NCBI GIs to Taxids.
Uses a binary lookup table.
These modules are being used by several groups now -- mainly working
with large metagenomics datasets -- and I am considering uploading them
to CPAN, but I am not clear on where these modules should be placed there.
How do you think I should name these modules? (e.g. where these modules
should live in CPAN?) Their natural place could be under
Bio::DB::Taxonomy, maybe Bio::DB::Taxonomy::Lite /
Bio::DB::Taxonomy::Lite::NCBI / etc...? Is this possible (and
convenient) without being part of Bioperl? Any other suggestions?
Thank you very much in advance,
More information about the Bioperl-l