BLAST, X vs. U, and EMBOSS

David Mathog mathog at mendel.bio.caltech.edu
Tue Jun 4 18:34:25 UTC 2002


> In nr there is an entry with gi= 2018236 which ends with:

Sorry, I dropped a character in the cut and paste, it's: 12018236

Tao Tao <tao at ncbi.nlm.nih.gov> points out that U is Iupac 
for selenocysteine, see:

http://www.chem.qmul.ac.uk/iupac/AminoAcid/A2021.html#AA212

This gets very confusing because entrez returns Genbank format
with U->X, but fasta (and ASN.1) with U as U.

Which protein alphabet is EMBOSS supposed to recognize
for protein?

And all that aside, X vs. X or X vs. U in blastp really does
introduce two  unnecessary gaps in the alignment, which
can be easily demonstrated with bl2seq on gi 14250938 vs. itself.

Regards,

David Mathog
mathog at caltech.edu
Manager, Sequence Analysis Facility, Biology Division, Caltech




More information about the EMBOSS mailing list