[Bioperl-l] Bug or special design of the 'length' method for Bio::Seq ?

Chris Fields cjfields at illinois.edu
Sun Jul 17 14:20:48 UTC 2011


length() is defined in BioPerl as 'Get the length of the sequence in number of symbols (bases or amino acids)'. We count '*' as a translated codon and as part of length() for the reasons Peter mentions.  One can also set the length for a 'virtual' sequence (no actual sequence present), but if a sequence is present it's not supposed to lie either (e.g. you can't just set it to anything).

chris

On Jul 17, 2011, at 8:40 AM, Peter Cock wrote:

> This is deliberately giving the length of the string (Biopython does the same).
> 
> Have you considered what would you expect for this example sequence?
> i.e. Where you translate a whole sequence including all the stop
> codons?
> 
>> Translation
> MAASEHRCVGCGFRVKSLF*AMKLMNO*P
> 
> It is a practical decision to give the length including the stop
> symbols, so that the sequence behaves like a Perl string.
> 
> Peter
> 
> On 7/17/11, Tao Zhu <tzhu at mail.bnu.edu.cn> wrote:
>> Hi,everyone
>> Suppose a protein sequence like:
>> 
>>> Protein
>> MAASEHRCVGCGFRVKSLF*
>> 
>> Do you think the length of such sequence is 19 or 20? In my opinion, the
>> star "*" is only a terminal symbol of a protein sequence, so it
>> shouldn't be counted into protein length. But in fact the "length"
>> method of Bio::Seq results in length of 20.
>> 
>> --
>> Tao Zhu, College of Life Sciences, Beijing Normal University, Beijing
>> 100875, China
>> Email: tzhu at mail.bnu.edu.cn
>> 
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l





More information about the Bioperl-l mailing list