[Biopython-dev] Re: Seq is broken in CVS

f.sohm at whsmithnet.co.uk f.sohm at whsmithnet.co.uk
Thu Aug 26 11:32:15 EDT 2004


Michael Hoffman writes: 

> On Thu, 26 Aug 2004, Michiel Jan Laurens de Hoon wrote: 
> 
>> The complement and reverse_complement were added to Seq.py because previously
>> several implementations of similar functions existed in different parts of
>> Biopython. The function forward_complement in Bio.GFF.easy takes a Seq object
>> and returns a Seq object. The function complement in Bio.SeqUtils takes a string
>> and returns a string. I have a slight preference for returning a Seq object for
>> consistency with MutableSeq, and also because a user might expect to receive an
>> object of the same type.
> 
> I understand what Fred was saying but I agree with Michiel's logic
> here. If I started with a Seq I would expect to get one back from a
> reverse function.
> -- 
> Michael Hoffman <hoffman at ebi.ac.uk>
> European Bioinformatics Institute
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at biopython.org
> http://biopython.org/mailman/listinfo/biopython-dev

Hi, 

I see your point there. But I still think that for some applications the 
chance to do a silly mistake outweight the benefit. 


In fact, thinking about it :
would not be a good time to add another set of class to take care of 
sequences. DNA, RNA and protein respectively? 

I mean Seq and MutableSeq do their job and do it well,
but it is mainly database oriented. 

A translation in plain English of what Seq and MutableSeq do, is more or 
less :
'containing the boring lines of repetitive letters that people keep
adding at the end of their genbank/EMBL records.' ;-) 

Seq and MutableSeq  do not 'care' about what the sequence they hold really 
is. 

Hence the lake of consistency between the Alphabet and the data
(see the mail of Michiel some time ago).
You get as well a problem with the representation of biological forms of DNA
for example. Seq do not make the difference between a circular sequence and
a linear sequence. You can add without problem two DNAs cut with 
incompatible restriction enzymes. etc ... 

I would like to know if people think it is worth to go this way and 
implement these three types of class (may be more double stranded and single 
stranded nucleic acids are to be considered). 

I can throw in a set of class to implement the DNA. But it will only be 
interesting if somebody else help to tackle the Protein. 

Anyway if people are interested I think it would be better to think about it
carefully before hand. 

bye 

Fred 





More information about the Biopython-dev mailing list