[Biopython-dev] Sequential SFF IO
p.j.a.cock at googlemail.com
Wed Jan 26 19:44:10 UTC 2011
On Wednesday, January 26, 2011, Kevin Jacobs wrote:
> How many output files do you have? Assuming it is small I'd go for
> the simple solution of one loop over the input SFF file for each output
> We're routinely multiplexing hundreds or thousands of samples per SFF file and using sequence barcodes to identify them. The number of outputs make a one-pass solution is much preferable. Anyhow, it seems that this has gone beyond the scope of generic Biopython, so I'm happy to make my modifications locally (and share the results if anyone is interested). We're currently using the Roche/454 sff tools, but they have known bugs and we have 5' and 3' adapters to consider.
> Thanks, -Kevin
I've got a better feel for what you are attempting to do now. I think
one avenue would be to extend the write_header method to take some SFF
specific arguments and add a write_footer method taking the optional
Roche XML manifest which would (assuming it could seek) write the
index block and update the header. All this may not make much sense
without looking at the code and the SFF format spec.
I'm currently looking at trimming 5' and 3' PCR primer sequences -
which could equally be used for barcodes etc. I'd probably wrap this
as a Galaxy tool (using Biopython).
More information about the Biopython-dev