[Bioperl-l] [BioSQL-l] Loading sequences with novel NCBI taxon id
Peter
biopython at maubp.freeserve.co.uk
Thu Mar 13 23:13:32 UTC 2008
On Thu, Mar 13, 2008 at 10:51 PM, Hilmar Lapp <hlapp at gmx.net> wrote:
> (this is more of a bioperl question than a biosql one)
Well, yes and no. And I'm not subscribed to the Bioperl list, nor the
BioJava one, nor the BioRuby one.
> The load_ncbi_taxonomy.pl script is designed to update the taxon
> tables in a non-disruptive way, and if there weren't many changes
> shouldn't actually take that long (except that recalculating the
> nested set values may take a couple of minutes).
Do you think when faced with a novel taxon id, Biopython/BioPerl/...
could write some minimal taxonomy entry (without any guess work based
on the species name), in order to record the sequence's taxon - and
then running an improved load_ncbi_taxonomy.pl at a later date would
sort out the proper taxonomy?
> Bioperl-db will store the taxon information it finds in the
> Bio::Species object if it can't locate the taxon by lookup, and will
> not raise an error. The problem with this is that it relies on the
> Bio::SeqIO parser to have gotten the species and lineage information
> correct, which is sometimes a wrong assumption for exotic species.
> Most often the error will not manifest itself at the time of storing
> the erroneously parsed information, but when it is re-retrieved and
> used to populate a Bio::Species object.
This is what I would like to avoid with Biopython.
> For the SymAtlas project we had this situation (new species in
> sequence updates that the last NCBI taxonomy update hadn't yet
> brought in) quite regularly. I wrote a SQL script would fix those
> 'haphazard' additions such that load_ncbi_taxonomy would update them
> to their correct values come the next NCBI taxonomy update. I can
> send you the script (it would be for the Oracle version), but I'm not
> sure this is a widely viable strategy.
So this wasn't integrated with load_ncbi_taxonomy.pl at all?
Peter
More information about the Bioperl-l
mailing list