BLAST, X vs. U, and EMBOSS
David Mathog
mathog at mendel.bio.caltech.edu
Tue Jun 4 18:34:25 UTC 2002
> In nr there is an entry with gi= 2018236 which ends with:
Sorry, I dropped a character in the cut and paste, it's: 12018236
Tao Tao <tao at ncbi.nlm.nih.gov> points out that U is Iupac
for selenocysteine, see:
http://www.chem.qmul.ac.uk/iupac/AminoAcid/A2021.html#AA212
This gets very confusing because entrez returns Genbank format
with U->X, but fasta (and ASN.1) with U as U.
Which protein alphabet is EMBOSS supposed to recognize
for protein?
And all that aside, X vs. X or X vs. U in blastp really does
introduce two unnecessary gaps in the alignment, which
can be easily demonstrated with bl2seq on gi 14250938 vs. itself.
Regards,
David Mathog
mathog at caltech.edu
Manager, Sequence Analysis Facility, Biology Division, Caltech
More information about the EMBOSS
mailing list