[BioSQL-l] Recording "nucleotide" in the sequence table?

Sat May 16 11:53:07 UTC 2009

Hi all,

You may recall a year ago or so, we talked about how BioPerl and
Biopython used lower case alphabet names ("dna", "rna", "protein")
while BioJava was inconsistent and used upper (or even mixed case).

http://lists.open-bio.org/pipermail/biopython/2007-November/003894.html
http://lists.open-bio.org/pipermail/biojava-l/2007-November/006034.html
http://lists.open-bio.org/pipermail/biosql-l/2008-March/001185.html

You'll notice that thread was split over several mailing lists (and
looking back, I think I missed some posts as I only read the Biopython
and BioSQL lists).

Anyway, this lead to the following proposal:

http://www.biosql.org/wiki/Enhancement_Requests#Check_constraint_on_biosequence.alphabet

In Biopython we also use "unknown" for sequences which are not known
to be "dna", "rna", "protein".  I presume this was copying BioPerl.

In a recent bug report (Bug 2829) it was pointed out that we
(Biopython) don't attempt to record nucleotide alphabets in BioSQL
(i.e. a sequence which could be DNA or RNA but we don't know which),
they just get "unknown" as their biosequence.alphabet entry.

Is there any precedent in BioPerl, BioJava or BioRuby for how to
handle this?  If not, I'd like to introduce and agree on "nucleotide"
for this situation.

Peter