[BioSQL-l] Recording "nucleotide" in the sequence table?

Mark Schreiber markjschreiber at gmail.com
Sat May 16 14:58:19 UTC 2009


I don't think you can do this with certainty. If you don't know the source
alphabet then an amino acid sequence could look like dna if it is only using
acgt and some of the ambiguity codes.

If it is a long sequence it will become increasingly unlikey it is amino
acid but never certain.

On 16 May 2009, 7:54 PM, "Peter" <biopython at maubp.freeserve.co.uk> wrote:

Hi all,

You may recall a year ago or so, we talked about how BioPerl and
Biopython used lower case alphabet names ("dna", "rna", "protein")
while BioJava was inconsistent and used upper (or even mixed case).

http://lists.open-bio.org/pipermail/biopython/2007-November/003894.html
http://lists.open-bio.org/pipermail/biojava-l/2007-November/006034.html
http://lists.open-bio.org/pipermail/biosql-l/2008-March/001185.html

You'll notice that thread was split over several mailing lists (and
looking back, I think I missed some posts as I only read the Biopython
and BioSQL lists).

Anyway, this lead to the following proposal:

http://www.biosql.org/wiki/Enhancement_Requests#Check_constraint_on_biosequence.alphabet

In Biopython we also use "unknown" for sequences which are not known
to be "dna", "rna", "protein".  I presume this was copying BioPerl.

In a recent bug report (Bug 2829) it was pointed out that we
(Biopython) don't attempt to record nucleotide alphabets in BioSQL
(i.e. a sequence which could be DNA or RNA but we don't know which),
they just get "unknown" as their biosequence.alphabet entry.

Is there any precedent in BioPerl, BioJava or BioRuby for how to
handle this?  If not, I'd like to introduce and agree on "nucleotide"
for this situation.

Peter
_______________________________________________
BioSQL-l mailing list
BioSQL-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/biosql-l



More information about the BioSQL-l mailing list