[Biopython-dev] BioSQL questions

Brad Chapman chapmanb at arches.uga.edu
Sun Nov 24 15:37:08 EST 2002


> Bioentries are retrieved via two classes, DBSeqRecord and DBInternalSeq. 
>  Why?
> 
> I'm asking because it seems sound, but I don't understand why which data 
> is retrieved by which class, e.g. sequence name by DBInternalSeq.

This is just an implementation detail that has carried over from the
first stab at this by Andrew. He'll have to tell use why he split it
like this; he probably had some mad idea in his head at the time, but I
reckon now it is more arbitrary than anything.

> Concerning the sequence name: a bioentry contains four ids:
> 
> * bioentry_id, the internal id
> 
> * display_id, called "name" by SeqRecord
> 
> * accession, called "id"
> 
> * identifier, not retrieved.
> 
> In Genbank parlance, display_id == LOCUS, accession == ACCESSION (duh!) 
> and identifier == GI.

Yeah, this is hideous and completely my fault. What I guess I think we
should have is:

name -- the LOCUS/display_id

id -- the GI/identifier; this should not be the accession, that sucks.

accessions -- a list of accession numbers; I think Jeff said something
about multiple accession numbers recently, so we can support them all
and then just jam the first one into BioSQL.

> Obviously, the GI should be retrieved

Right now GI is nastily jammed into data.accessions['gi']. Yuck.

How does this sound for a plan? Does this take care of most of the
ugliness?

Brad



More information about the Biopython-dev mailing list