[Biopython-dev] [Bug 2681] BioSQL: record annotations enhancements

bugzilla-daemon at portal.open-bio.org bugzilla-daemon at portal.open-bio.org
Fri Nov 21 22:41:16 UTC 2008


http://bugzilla.open-bio.org/show_bug.cgi?id=2681





------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk  2008-11-21 17:41 EST -------
(In reply to comment #0)
> 1) Fixed date/dates typo.

Why is it a typo?  Change not checked in.

> 2) comment's were being stored by not retrieved - fixed with test.

Looks good, except for returning an empty list if there were no comments.

> 3) A 'reference' annotation, even if an empty list, was being retrieved in a
> DBSeqRecord. Fixed so that if there are no references there is no annotation
> in DBSeqRecord.

I agree, but preferred a smaller change for this:

Checking in BioSQL/BioSeq.py;
/home/repository/biopython/biopython/BioSQL/BioSeq.py,v  <--  BioSeq.py
new revision: 1.33; previous revision: 1.32
done
Checking in Tests/test_BioSQL_SeqIO.py;
/home/repository/biopython/biopython/Tests/test_BioSQL_SeqIO.py,v  <-- 
test_BioSQL_SeqIO.py
new revision: 1.29; previous revision: 1.28
done

This was based closely on your patch, so thank you!  You are making steady
progress through the remaining "TODO" notes I left when writing
test_BioSQL_SeqIO.py :)

> Some swiss prot SeqRecords have ncbi_taxid and they are retrieved
> correctly by DBSeqRecord. TODO: others have ncbi_taxid that is missing
> from the retrieved DBSeqRecord: sp012, sp014, 

Note some swiss prot records may be multi-species, which the BioSQL schema
can't cope with.  Not sure if that applies here.

> Swissprot, fasta, and EMBL SeqRecords dont have a gi annotation, retrieved
> DBSeqRecords do. Loader uses the 'record_id' (line 522) as the identifier in
> bioentry, if the gi annotation is missing, which is pulled as the gi
> annotation.

There probably is something not quite right here.  Are you talking about the
bioentry.identifier entry in the database?  Perhaps an explicit example might
help.  As an aside, I think "gi" (GeneIndex used by NCBI) might be better
stored in the record.dbxrefs, but that could be a parser change...

> 'contig' is ignored by loader because it's a SeqFeature object. Is there any
> reason it couldnt be loaded and retrieved? (record is GenBank/NT_019265.gb)

I couldn't even say off hand how the CONTIG line in that example would be
parsed, let alone how it gets dealt with when loading into BioSQL.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.



More information about the Biopython-dev mailing list