[Biopython-dev] Bio.SeqIO

Peter biopython-dev at maubp.freeserve.co.uk
Wed Mar 7 23:16:48 UTC 2007


I have renamed SequenceToDict and SequencesToAlignment as to_dict and 
to_alignment, which as Chris Lasher pointed out follows the PEP8 python 
style guide.

While there may be better places for these to functions to live, leaving 
them in SeqIO seems reasonable to me.  Still - if we do want to move 
them (or remove them) in the near future it would be better to do this 
before releasing BioPython 1.43

Other than that, I think Bio.SeqIO is "ready" for its first release.

Michiel Jan Laurens de Hoon wrote:
> It may be a good idea to add a keyword allow_identical_keys (probably a 
> better name is needed here), False by default, in SeqIO.parse to specify 
> if SeqIO.parse should raise an exception if two records with an 
> identical record.id are found. Whereas this is more of a problem when 
> creating a dictionary, I think that this is also relevant in general.

I'm not very keen on this "allow_identical_keys" option for SeqIO.parse()

However, I think we could do that in the SeqIO.parse function itself 
(rather than repeating the code many times for each underlying parser).

One catch is that the exception would get raised once a duplicate is 
found - possibly after the user has already processed the first half of 
the file.

>> Also, wouldn't this prevent us making a SeqRecord 
>> inherit from Seq (another interesting idea you proposed in the past)?
> 
> Not necessarily; there are two ways to avoid this:
> A) SeqRecord could inherit both from list and from Seq;
> B) Instead of letting SeqRecord inherit from list, we could add a next() 
> and __iter__ method to the SeqRecord class (returning record.id and 
> record, and then StopIteration); this will also let us create a 
> dictionary with dict(SeqIO.parse(handle, format)).

I think I didn't make myself clear.  I wanted to reserve the __iter__ 
method to the SeqRecord class for use like this:

for residue in record :
     #assuming residue this is also a SeqRecord object
     print residue.seq.tostring()

and similarly for __iter__ of a Seq class:

for residue in seq :
     #assuming residue is also a Seq object,
     print residue.tostring()

To me this syntax seems very natural, but does seem to block your clever 
dict() plan.

Peter




More information about the Biopython-dev mailing list