[Biopython-dev] [Bug 2681] BioSQL: record annotations enhancements

bugzilla-daemon at portal.open-bio.org bugzilla-daemon at portal.open-bio.org
Fri Jan 30 15:36:52 UTC 2009


http://bugzilla.open-bio.org/show_bug.cgi?id=2681





------- Comment #8 from biopython-bugzilla at maubp.freeserve.co.uk  2009-01-30 10:36 EST -------
(In reply to comment #2)
> > 'contig' is ignored by loader because it's a SeqFeature object. Is there any
> > reason it couldnt be loaded and retrieved? (record is GenBank/NT_019265.gb)
> 
> I couldn't even say off hand how the CONTIG line in that example would be
> parsed, let alone how it gets dealt with when loading into BioSQL.

Basically the CONTIG line looks rather a lot like a feature location, typically
the join of lots of (external) sequences.  It makes some sense to parse this
into an object structure, which given the way joins are handled for features,
this lead the original author to represent the CONTIG information as a dummy
feature with lots of sub features.  Given the CONTIG can also include gaps (of
unknown length), this doesn't quite fit the current SeqFeature location objects
(see Bug 2745).

If we extend the location objects to cope with these gaps, then perhaps the
CONTIG can stay as a SeqFeature in which case for BioSQL maybe we should record
it in the SeqFeature table.  We'd have to invent a way to record these gap
locations though.

However, if we just stored the CONTIG line as a raw string, we could then store
it in BioSQL as just another bioentry qualifier (assuming it doesn't overflow
the text field limit).

I've checked how and where BioPerl stores the contig information using the
example Bruce used on Bug 2745, attachment 1213, and see that the CONTIG
information is stored in the bioentry_qualifier_value table under the term
"contig" under the ontology "Annotation Tags".  They have retained the separate
lines, storing each as a separate entry with an increasing rank.

Thus for compatibility with BioSQL, it would make sense for the GenBank parser
to store the CONTIG line as a simple string (or list of strings), and not as a
SeqFeature (which is currently half broken anyway - see Bug 2745).


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.



More information about the Biopython-dev mailing list