[EMBOSS] extract translations from genbank file
Karin Lagesen
karin.lagesen at bio.uio.no
Tue Aug 24 17:43:49 UTC 2010
Hello.
I am trying to extract the protein translations from a genbank file. I
am managing to get the proteins, but I would like to get the information
about the translation out from the file too.
Example:
CDS complement(93..1919)
/translation="MTLENTSPNPSQISLNLSGGIALGAYMAGVCFELVRQARKDNSP
LLIDLITGASAGAMTGAITAYYLLNREISNTEYESQNILQRAWVEKADMKDIDTVFAI
EDYRQVLNNLFKSQNESLLSQKGIKNIANLITENTDQLKVHQPLALVMTVTNLQGLLV
/product="hypothetical protein"
I have found two programs that could solve this problem. Coderet does
give me the protein sequences, but the fasta description lines of the
proteins are not easily relatable back to the genbank file.
>unknown_pro_1
MTLENTSPNPSQISLNLSGGIALGAYMAGVCFELVRQARKDNSPLLIDLITGASAGAMTG
AITAYYLLNREISNTEYESQNILQRAWVEKADMKDIDTVFAIEDYRQVLNNLFKSQNESL
LSQKGIKNIANLITENTDQLKVHQPLALVMTVTNLQGLLV
Extractfeat gives me sensible description lines, but for now I have not
been able to make it give me the protein, and not the DNA sequence.
>scaffold00002_93_1919 [CDS] Contig scaffold00002
atgaccctagaaaatacctctcccaatcctagtcaaatttccctaaatttgtcgggagga
attgccctcggagcttatatggctggggtgtgttttgaattagttagacaagccagaaaa
gacaattctcccctgttaattgatttgattaccggagcatctgctggggcgatgaccgga
....
So. Are there any other programs, or options/switches to the ones that I
have mentioned that I should be using?
TIA,
Karin
--
Karin Lagesen
Post Doc
Centre of Ecological and Evolutionary Synthesis (CEES)
Department of Biology
University of Oslo
P.O. Box 1066 - Blindern
N-0316 Oslo
Norway
More information about the EMBOSS
mailing list