[Biopython-dev] [Biopython (old issues only) - Bug #2681] (Migrated) BioSQL: record annotations enhancements

redmine at redmine.open-bio.org redmine at redmine.open-bio.org
Thu Jul 5 13:43:53 UTC 2018


Issue #2681 has been updated by Peter Cock.

Description updated
Status changed from New to Migrated

Migrated to GitHub issue https://github.com/biopython/biopython/issues/1718

----------------------------------------
Bug #2681: BioSQL: record annotations enhancements
https://redmine.open-bio.org/issues/2681#change-15405

* Author: Cymon J.
* Status: Migrated
* Priority: Normal
* Assignee: Biopython Dev Mailing List
* Category: BioSQL
* Target version: Not Applicable
* URL: 
----------------------------------------
BioSQL storage and retrieval of record annotations. See also bug 2396.


Patch fixes 3 annotations:

1) Fixed date/dates typo.
2) comment's were being stored by not retrieved - fixed with test.
3) A 'reference' annotation, even if an empty list, was being retrieved in a
DBSeqRecord. Fixed so that if there are no references there is no annotation in
DBSeqRecord.

Other annotations:

'date', 'ncbi_taxid', 'gi', and 'contig' are the only annotations we are not
handling correctly in the test suite.

'date' can be ignored if present in DBSeqRecord but absent in SeqRecord because the current date is entered into table if a date is not present in the record.

Annotation 'ncbi_taxid' will be present in the DBSeqRecords even when not
present in the loaded SeqRecord as they are grabbed from the taxon table. We can
therefore ignore this specific comparision: old record absent, new record
present. Some swiss prot SeqRecords have ncbi_taxid and they are retrieved correctly by DBSeqRecord. TODO: others have ncbi_taxid that is missing from the retrieved DBSeqRecord: sp012, sp014, 

Swissprot, fasta, and EMBL SeqRecords dont have a gi annotation, retrieved
DBSeqRecords do. Loader uses the 'record_id' (line 522) as the identifier in
bioentry, if the gi annotation is missing, which is pulled as the gi annotation.
So the swissprot, fasta, and embl DBSeqRecords return the accession as the gi
(GenBank identifier). I think this is misleading; annotation 'gi' in the
DBSeqRecord should really be named a more generic 'identifier'...  What to do
here?

'contig' is ignored by loader because it's a SeqFeature object. Is there any reason it couldnt be loaded and retrieved? (record is GenBank/NT_019265.gb)

---Files--------------------------------
annotations1.patch (3.9 KB)


-- 
You have received this notification because you have either subscribed to it, or are involved in it.
To change your notification preferences, please click here and login: http://redmine.open-bio.org
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.open-bio.org/pipermail/biopython-dev/attachments/20180705/49f3c59d/attachment-0001.html>


More information about the Biopython-dev mailing list