[Bioperl-l] parsing coded_by subfeature
Will Fischer
wfischer at uts.cc.utexas.edu
Mon Jul 21 15:54:05 EDT 2003
On Sunday, July 20, 2003, at 01:33 PM, Jack Chen wrote:
> I am also curious how to handle the cases where the 'coded_by'
> subfeature
> contains the ">" and "<" signs. I am not really sure what they mean.
> And I
> noticed that wherever these signs appear, the protein sequences
> retrieved
> are different from the conceptual translation from the nucleotide
> sequences. For example:
>
> [nchen at whey blast_db_checked]$ ./test.pl "gi|8573628|gb|AAF77462.1|"
> Protein obtained from GenBank:
> MPQMAPISWLLLFIIFSITFILFCSINYYSYMPNSPKSNELKNINLNSMNWKW
> CDS sequence is: [deleted]
> Conceptual translation is:
> IPQIAPIR*LLLFIIFSITFILFCSINYYSYMPNSPKSNELKNINLNSIN*K**
and
> I think I was confused by the fact that the protein sequence provided
> by
> the GenBank does not match that conceptually translated sequence. Say,
> most of the nucleotide suquences (after joining together) are actually
> longer than the protein sequences.
"<" means "before this base"; ">", "after this base. These are
typically used for features that are not completely included in the
sequence, or (less often) where the actual start and end of a feature
is not precisely known.
As for the differences in translation, they're due to the /transl_table
qualifier: different critters use different genetic codes; and codon
usage in mitochondria (as in your example) reflects their bacterial
origins. Any code for translating GenBank entries ought to take these
translation tables into account. One example (in a non-bioperl
context) can be seen in my standalone translation script, nt2aa, at
http://sunflower.bio.indiana.edu/~wfischer/Perl_Scripts/#nt2aa .
Mitochondria, in particular, use canonical stop codons to encode
tryptophan (W); I'm guessing that premature stop codons (introduced by
failure to account for this) explain your observed "nucleotide
sequences ... longer than the protein sequences".
_____________________________________________________________
Will Fischer wfischer at uts.cc.utexas.edu
University of Texas at Austin Lab Ph.: 512-232-7114
Integrative Biology Lab Fax: 512-471-3878
1 University Station C0930
Austin, TX 78712-0253
_____________________________________________________________
Will Fischer wfischer at uts.cc.utexas.edu
University of Texas at Austin Lab Ph.: 512-232-7114
Integrative Biology Lab Fax: 512-471-3878
1 University Station C0930
Austin, TX 78712-0253
More information about the Bioperl-l
mailing list