[EMBOSS] extract translations from genbank file

Karin Lagesen karin.lagesen at bio.uio.no
Tue Aug 24 17:43:49 UTC 2010


Hello.

I am trying to extract the protein translations from a genbank file. I 
am managing to get the proteins, but I would like to get the information 
about the translation out from the file too.

Example:

      CDS             complement(93..1919)                  		


/translation="MTLENTSPNPSQISLNLSGGIALGAYMAGVCFELVRQARKDNSP
LLIDLITGASAGAMTGAITAYYLLNREISNTEYESQNILQRAWVEKADMKDIDTVFAI 
EDYRQVLNNLFKSQNESLLSQKGIKNIANLITENTDQLKVHQPLALVMTVTNLQGLLV
/product="hypothetical protein"

I have found two programs that could solve this problem. Coderet does 
give me the protein sequences, but the fasta description lines of the 
proteins are not easily relatable back to the genbank file.

>unknown_pro_1
MTLENTSPNPSQISLNLSGGIALGAYMAGVCFELVRQARKDNSPLLIDLITGASAGAMTG
AITAYYLLNREISNTEYESQNILQRAWVEKADMKDIDTVFAIEDYRQVLNNLFKSQNESL
LSQKGIKNIANLITENTDQLKVHQPLALVMTVTNLQGLLV

Extractfeat gives me sensible description lines, but for now I have not 
been able to make it give me the protein, and not the DNA sequence.

>scaffold00002_93_1919 [CDS] Contig scaffold00002
atgaccctagaaaatacctctcccaatcctagtcaaatttccctaaatttgtcgggagga
attgccctcggagcttatatggctggggtgtgttttgaattagttagacaagccagaaaa
gacaattctcccctgttaattgatttgattaccggagcatctgctggggcgatgaccgga
....


So. Are there any other programs, or options/switches to the ones that I 
have mentioned that I should be using?


TIA,

Karin

-- 
Karin Lagesen
Post Doc
Centre of Ecological and Evolutionary Synthesis (CEES)
Department of Biology
University of Oslo
P.O. Box 1066 - Blindern
N-0316 Oslo
Norway



More information about the EMBOSS mailing list