[Biopython-dev] sff reader

Peter biopython at maubp.freeserve.co.uk
Thu Jul 23 09:34:26 UTC 2009


On Wed, Jul 22, 2009 at 9:51 PM, James Casbon<casbon at gmail.com> wrote:
>
> 2009/7/22 Peter <biopython at maubp.freeserve.co.uk>:
>> On Wed, Jul 22, 2009 at 7:16 PM, James Casbon<casbon at gmail.com> wrote:
>>>
>>> A bit late to the party, but I put my sff parsing code into this fork
>>> before reading this thread:
>>> http://github.com/jamescasbon/biopython/tree/sff
>>
>> Sounds interesting - github is being very slow for me right now,
>> so I'll probably take a look tomorrow. I'll be interested to see how
>> it compares to my rough code on Bug 2837 based on the code
>> from Jose Blanca (this doesn't do paired end reads yet).
>> http://bugzilla.open-bio.org/show_bug.cgi?id=2837
>
> I don't think there is much in it really.  You have a factored
> BinaryFile class, I have classes for the components of the SFF file.
> Both are based around struct.

Github is working fine now - maybe my wireless network was
just too slow at home last night?

Jose's code uses seek/tell which means it has to have a handle
to an actual file. He also used binary read mode - I'm not sure if
this was essential or not.

James' code seems to make a single pass though the file handle,
without using seek/tell to jump about. I think this is nicer, as it is
consistent with the other SeqIO parsers, and should work on
more types of handles (e.g. from gzip, StringIO, or even a
network connection).

It looks like you (James) construct Seq objects using the full
untrimmed sequence as is. I was undecided on if trimmed or
untrimmed should be the default, but the idea of some kind of
masked or trimmed Seq object had come up on the mailing list
which might be useful here (and in contig alignments). i.e.
something which acts like a Seq object giving the trimmed
sequence, but which also contains the full sequence and trim
positions.

I also want to look at paired end reads in SFF files...

Peter




More information about the Biopython-dev mailing list