[BioSQL-l] Handling four strand states (as in GFF3) in BioSQL

Peter Cock p.j.a.cock at googlemail.com
Tue May 24 11:36:57 UTC 2011


Dear all,

This email was triggered from a Biopython discussion about how to
represent the four strand states in GFF3 (+, -, ? and .) in Biopython
(we are using +1, -1, 0 and None). See e.g.

http://lists.open-bio.org/pipermail/biopython/2011-April/007194.html
http://lists.open-bio.org/pipermail/biopython/2011-May/007299.html

The GFF3 spec defines strand as follows, see:
http://www.sequenceontology.org/gff3.shtml

> Column 7: "strand"
> The strand of the feature. + for positive strand (relative to the
> landmark), - for minus strand, and . for features that are not
> stranded. In addition, ? can be used for features whose
> strandedness is relevant, but unknown.

The BioSQL schema uses a tiny int (not null) for the strand, so three
states -1, 0 and +1 are fine - but not a fourth state of not applicable
(which would map nicely to null).

Currently I presume all the BioSQL libraries use 0 in the BioSQL
database for anything other than a +1 or -1 strand, effectively
covering "non-stranded" and "stranded but unknown" in one group.

If we want to extend BioSQL to allow four strand states as in GFF3,
the simplest solution could be to allow null for this column. Then:

GFF3 "+" (forward) becomes +1 in BioSQL
GFF3 "-" (reverse) becomes -1 in BioSQL
GFF3 "?" (stranded but unknown) becomes 0 in BioSQL
GFF3 "." (not stranded) becomes NULL in BioSQL

On the other hand, this fine distinction is of limited utility. e.g.
For storing protein records in BioSQL, we can just continue to
use zero in the database as the feature strand.

Is this worth changing?

Peter



More information about the BioSQL-l mailing list