[Bioperl-l] acquiring a local refseq + index
Chris Fields
cjfields at uiuc.edu
Mon Jan 1 23:19:12 UTC 2007
On Jan 1, 2007, at 2:17 PM, Erik wrote:
>> we could add this in as long as it passes (I'll try giving it a
>> workout with my local bacterial seqs tonight or tomorrow). However,
>> in the not-too-distant future your patch would likely be rendered
>> obsolete, as any parsing in Bio::SeqIO modules pertaining to
>> Bio::Species-related matters will be deprecated in favor of simple
>> parsing (more foolproof, less uncertainty) and Bio::Taxon (which has
>> optional db lookups using NCBI Taxonomy). Bio::Species and anything
>> related to it are considered marked for deprecation. Fair warning...
>
> What does simple parsing mean? Just returning the whole ORGANISM
> string,
> and leaving further parsing to application side?
Current behavior with parsing tries to determine genus/species from
the data in the sequence record data alone, which has become
increasingly more difficult and unreliable over the years. Since a
perfectly valid source for taxonomic information exists (NCBI
Taxonomy), and each GenBank/EMBL sequence record is tagged with a
relevant TaxID, it makes more sense to base reliable parsing of
taxonomic data on that resource.
Sendu has essentially set up Bio::Taxon for that reason; Bio::Species
has been changed to inherit Bio::Taxon (which is also a
Bio::Tree::Node) but still exhibit older behavior (i.e. retain the
old API). It will gradually be shifted out in favor of Bio::Taxon by
rel 1.8. We hope.
> I shall look a bit closer at the Bio::Taxon and its relation to the
> parser
> modules, assuming there still *is* a relation. :)
>
> Maybe someone could elaborate just a little bit to get me started
> on how
> to get taxonomic data from a refseg id or a genbank entry?
I'm assuming you could use code similar to that found in
taxonomy2tree.pl (in the scripts/taxa directory in CVS). I believe
the NCBI taxid is accessible via:
$seq->species->ncbi_taxid
The script above should help somewhat, and the HOWTO on Trees I think
also has some more. Maybe some of the newer Bio::Taxon behavior
needs to be added at some point?
chris
More information about the Bioperl-l
mailing list