[Biopython-dev] New Bio.SeqIO code

Michiel de Hoon mdehoon at c2b2.columbia.edu
Fri Nov 3 02:44:47 UTC 2006


Peter wrote:
> Chris Lasher wrote:
>> Peter wrote:
>>> One point against names like File2SequenceIterator is the pun on 
>>> two versus to (i.e. convert) will not be so obvious to non-native 
>>> English speakers.
>> I'd like to second that. It's cute, sure, but FileToSequenceIterator
>>  isn't that much more difficult, and leaves no room for confusion. 
>> (e.g., Where's the File1SequenceIterator?)
> 
> I would be happy with FileToSequenceIterator, or even
> FileToSequenceIter.  FileToSeqIter is shorter but we don't actually
> return Seq objects so I would avoid that.
> 
> Does anyone else have any suggestions?

Yes, but let's discuss function names after we decide which functions we 
want.

> While it does sound like a nice idea for the end user, the idea of
> filenames and handles is pretty important in python, and maybe we
> shouldn't worry about forcing newcomers deal with handles.  After all,
> the SeqIO system will make them deal with iterators and SeqRecords which
> I think are far more complicated!
> 
> What do you think Michiel?

My preferred solution would be for File2SequenceIterator to take handles 
only.
Same as Bio.Blast:

blast_out = open('my_blast.out')
b_parser = NCBIXML.BlastParser()
b_record = b_parser.parse(blast_out)

> Chris Lasher wrote:
>> Which brings me to the issue of "guessing" a file's format. Yikes, 
>> again! I'd expect that kind of "magickery" from Perl, but once again,
>> explicit is better than implicit. I honestly think it's not too much
>> to expect the user to know what filetype they're expecting BioPython
>> to deal with. Could you guys please explain the motivation behind 
>> this to me?
 >......
>
> I think Michiel and I where happy to leave this question for later...
> 
I am leaning towards Chris' opinion. File type guessing (from extension 
or file contents) doesn't seem really necessary. At least, I don't 
remember a user asking for it. The benefits of file type guessing from 
the extension are minimal (since a user can probably do that more 
reliably himself, knowing the file names he's likely to encounter). And 
since file type guessing will not be foolproof, it may even be 
confusing. Once file type guessing is available in Biopython though, 
we're committed to it and we'll have to support it. So I'd be happier 
without the file type guessing functionality.

That said, if somebody really wants it, I can live with it.

--Michiel.




More information about the Biopython-dev mailing list