compseq: is U an amino acid

Gary Williams, Tel 01223 494522 gwilliam at
Wed Aug 21 08:18:06 UTC 2002

U codes for the amino acid selenocysteine.

See the IUPAC documentation for one-letter amino-acids:


> "JAEN (Jacob Engelbrecht)" wrote:
> I have been using compseq for protein sequences and wondered why 'U'
> is reported as an amino acid?
> I looked in the code (nucleus/embnmer.c) and found it was specifically
> accounted for, whereas 'X' which in many databases  as unknown is not
> specifically accounted for.
> Would it not make sense to have options which made specific symbols
> part of the alphabet or left them out:
> -leaveout XU or -include BZXU
> Jacob Engelbrecht, Phd
> Insulin Research
> Novo Nordisk
> 6A1.038 Novo Alle
> DK-2880 Bagsvaerd
> Denmark
> tel: +45 4442 4403
> mail: jaen at

Gary Williams               Tel: +44 1223 494522  Fax: +44 1223 494512
mailto:G.Williams at  
Bioinformatics,MRC HGMP Resource Centre,Hinxton,Cambridge, CB10 1SB,UK

More information about the EMBOSS mailing list