[Biopython-dev] Bio.SeqIO

Michiel Jan Laurens de Hoon mdehoon at c2b2.columbia.edu
Wed Mar 7 20:50:59 UTC 2007


Peter wrote:
>> The upshot is that we can now create a dictionary like this:
>>  >>> d = dict(SeqIO.parse(handle, format))
>> without any changes to Bio.SeqIO.
> 
> That is clever...
> 
>> Two things get lost here:
>> 1) We can't have a key_function to change how to choose the key.
>> 2) We're no longer checking if all keys are different. This can be 
>> fixed by saving the keys in the parser function and raising an 
>> exception if two identical keys are found. This implies though that 
>> the same exception is raised in all use cases of SeqIO.parse, which 
>> may not be what we want.
> 
> Sadly not ideal.

About 2):
It may be a good idea to add a keyword allow_identical_keys (probably a 
better name is needed here), False by default, in SeqIO.parse to specify 
if SeqIO.parse should raise an exception if two records with an 
identical record.id are found. Whereas this is more of a problem when 
creating a dictionary, I think that this is also relevant in general.

Note though that if SeqIO.parse checks for identical keys automatically, 
there is not much left to do for SeqIO.to_dict.

Btw, a to_dict function may fit in better with Bio.SeqRecord, as it is 
not specifically related to sequence file IO.

> Also, wouldn't this prevent us making a SeqRecord 
> inherit from Seq (another interesting idea you proposed in the past)?

Not necessarily; there are two ways to avoid this:
A) SeqRecord could inherit both from list and from Seq;
B) Instead of letting SeqRecord inherit from list, we could add a next() 
and __iter__ method to the SeqRecord class (returning record.id and 
record, and then StopIteration); this will also let us create a 
dictionary with dict(SeqIO.parse(handle, format)).

--Michiel.



-- 
Michiel de Hoon
Center for Computational Biology and Bioinformatics
Columbia University
1130 St Nicholas Avenue
New York, NY 10032



More information about the Biopython-dev mailing list