[Biopython-dev] Sequential SFF IO

Brad Chapman chapmanb at 50mail.com
Fri Jan 28 12:34:18 UTC 2011

Kevin and Peter;
I'm really enjoying this discussion -- thanks for talking this
through here.

> For just 5' barcode detection, I am using a memoized scheme that computes
> anchored alignments and then stores the result in a hash table
> (match/mismatch, edit distance).  This approach allows me to reject barcodes
> with too small an edit distance to the next best candidate.  It is
> reasonably fast for our fairly long 454 barcode set (10-'mers), though I do
> have an optional Cython version of the edit distance routine.  The
> pure-Python version is pretty zippy and can decode a 454 run in a minute or
> two.

This sounds like a nice approach. Do you have code available or is
it not packaged up yet?

I wrote up a barcode detector, remover and sorter for our Illumina
reads. There is nothing especially tricky in the implementation: it
looks for exact matches and then checks for approximate matches,
with gaps, using pairwise2:


The "best_match" function could be replaced with different
implementations, using the rest of the script as scaffolding to do
all of the other sorting, trimming and output.


More information about the Biopython-dev mailing list