[Biopython-dev] [Bug 2408] GenBank records do not contain U's

Thu Nov 22 05:18:03 UTC 2007

http://bugzilla.open-bio.org/show_bug.cgi?id=2408


------- Comment #2 from marcin.cieslik at gmail.com  2007-11-22 00:18 EST -------
I don't believe there are any U-containing records in GenBank since it is DNA
whats being sequenced. mRNA records also have T's. I think assigning alphabets
based on parsed sequence is overkill. Since it will make it more difficult to
translate (ambigous vs. unambigous - Translator exceptions). I would like the
parser to return IUPAC.ambigous_dna, but that is just my preference and since i
can fix it by a single record.alphabet assignment it is not critical at all.


(In reply to comment #1)
> So the GenBank file says in its header "ss-RNA" (single stranded RNA), and we
> obey this and set an RNA alphabet.  However, as you point out, the actual
> sequence contains T not U, so is in fact given as DNA!
> 
> Is this a bug in GenBank maybe...?
> 
> Do you think we should switch the T's into U's to match the stated alphabet, or
> simply return a DNA sequence based on the content?
> 


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.