[Bioperl-l] acquiring a local refseq + index

Erik er at xs4all.nl
Tue Jan 2 16:23:12 UTC 2007


That seems like an real improvement over parsing the name out of the
text-entry. I'll use taxid = $seq->species->ncbi_taxid from now on.

Thanks for that elucidation. :)



That leaves the error-throwing problem in Bio::DB::Flat, which I
encountered while making a local RefSeq BerkeleyDB index.

I supposed it remains worthwhile to prevent the indexing from breaking on
Bio::SeqIO instantiation (at least for the RefSeq entry set), so I have
put a simple fix on bugzilla that prevents one more problem entry
(NC_004822) from breaking the indexing process.


Thanks,

Erikjan





> On Jan 1, 2007, at 2:17 PM, Erik wrote:
>
>>> we could add this in as long as it passes (I'll try giving it a
>>> workout with my local bacterial seqs tonight or tomorrow).  However,
>>> in the not-too-distant future your patch would likely be rendered
>>> obsolete, as any parsing in Bio::SeqIO modules pertaining to
>>> Bio::Species-related matters will be deprecated in favor of simple
>>> parsing (more foolproof, less uncertainty) and Bio::Taxon (which has
>>> optional db lookups using NCBI Taxonomy).  Bio::Species and anything
>>> related to it are considered marked for deprecation.  Fair warning...
>>
>> What does simple parsing mean? Just returning the whole ORGANISM
>> string,
>> and leaving further parsing to application side?
>
> Current behavior with parsing tries to determine genus/species from
> the data in the sequence record data alone, which has become
> increasingly more difficult and unreliable over the years.  Since a
> perfectly valid source for taxonomic information exists (NCBI
> Taxonomy), and each GenBank/EMBL sequence record is tagged with a
> relevant TaxID, it makes more sense to base reliable parsing of
> taxonomic data on that resource.
>
> Sendu has essentially set up Bio::Taxon for that reason; Bio::Species
> has been changed to inherit Bio::Taxon (which is also a
> Bio::Tree::Node) but still exhibit older behavior (i.e. retain the
> old API).  It will gradually be shifted out in favor of Bio::Taxon by
> rel 1.8.  We hope.
>
>> I shall look a bit closer at the Bio::Taxon and its relation to the
>> parser
>> modules, assuming there still *is* a relation. :)
>>
>> Maybe someone could elaborate just a little bit to get me started
>> on how
>> to get taxonomic data from a refseg id or a genbank entry?
>
> I'm assuming you could use code similar to that found in
> taxonomy2tree.pl (in the scripts/taxa directory in CVS).  I believe
> the NCBI taxid is accessible via:
>
> $seq->species->ncbi_taxid
>
> The script above should help somewhat, and the HOWTO on Trees I think
> also has some more.  Maybe some of the newer Bio::Taxon behavior
> needs to be added at some point?
>
> chris
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>





More information about the Bioperl-l mailing list