[Bioperl-l] acquiring a local refseq + index

Chris Fields cjfields at uiuc.edu
Mon Jan 1 23:19:12 UTC 2007


On Jan 1, 2007, at 2:17 PM, Erik wrote:

>> we could add this in as long as it passes (I'll try giving it a
>> workout with my local bacterial seqs tonight or tomorrow).  However,
>> in the not-too-distant future your patch would likely be rendered
>> obsolete, as any parsing in Bio::SeqIO modules pertaining to
>> Bio::Species-related matters will be deprecated in favor of simple
>> parsing (more foolproof, less uncertainty) and Bio::Taxon (which has
>> optional db lookups using NCBI Taxonomy).  Bio::Species and anything
>> related to it are considered marked for deprecation.  Fair warning...
>
> What does simple parsing mean? Just returning the whole ORGANISM  
> string,
> and leaving further parsing to application side?

Current behavior with parsing tries to determine genus/species from  
the data in the sequence record data alone, which has become  
increasingly more difficult and unreliable over the years.  Since a  
perfectly valid source for taxonomic information exists (NCBI  
Taxonomy), and each GenBank/EMBL sequence record is tagged with a  
relevant TaxID, it makes more sense to base reliable parsing of  
taxonomic data on that resource.

Sendu has essentially set up Bio::Taxon for that reason; Bio::Species  
has been changed to inherit Bio::Taxon (which is also a  
Bio::Tree::Node) but still exhibit older behavior (i.e. retain the  
old API).  It will gradually be shifted out in favor of Bio::Taxon by  
rel 1.8.  We hope.

> I shall look a bit closer at the Bio::Taxon and its relation to the  
> parser
> modules, assuming there still *is* a relation. :)
>
> Maybe someone could elaborate just a little bit to get me started  
> on how
> to get taxonomic data from a refseg id or a genbank entry?

I'm assuming you could use code similar to that found in  
taxonomy2tree.pl (in the scripts/taxa directory in CVS).  I believe  
the NCBI taxid is accessible via:

$seq->species->ncbi_taxid

The script above should help somewhat, and the HOWTO on Trees I think  
also has some more.  Maybe some of the newer Bio::Taxon behavior  
needs to be added at some point?

chris



More information about the Bioperl-l mailing list