[Biopython-dev] Sequential SFF IO

Peter Cock p.j.a.cock at googlemail.com
Mon Feb 14 13:19:45 UTC 2011


On Mon, Feb 14, 2011 at 1:01 PM, Brad Chapman <chapmanb at 50mail.com> wrote:
> Peter;
>
>> Do you have (or can you point me at) any good sample data with
>> barcodes, or custom adapters or primer sequences? e.g. some SRA
>> numbers you've been using.
>
> This is a subset of two lanes from a barcoded flowcell for testing
> purposes:
>
> http://chapmanb.s3.amazonaws.com/110106_FC70BUKAAXX.tar.gz
>
> It has 12 barcoded samples, using the Illumina barcodes. The
> sequences are in this YAML file:
>
> https://github.com/chapmanb/bcbb/blob/master/nextgen/tests/data/automated/run_info.yaml
>

Great :)

>> I originally had three separate tools (with shared code) for working
>> with FASTA, FASTQ and SFF reads, which I have recently combined
>> into one single tool that does all three. Code here if anyone wants to
>> look at it.
>>
>> https://bitbucket.org/peterjc/galaxy-central/src/filter_fasta/tools/primers/
>
> Very nice. It would be great to get something general for barcode
> splitting as a Galaxy tool. Thanks for looking at this,
> Brad

Yes - assuming what they have already isn't good enough (at
very least the Galaxy barcode wrapper for fastx currently only
handles fastq-solexa but I think that can be fixed).
http://lists.bx.psu.edu/pipermail/galaxy-dev/2011-February/004290.html

I've been focused on the PCR case where my sequences have
got IUPAC ambiguity characters. For barcodes that shouldn't be
an issue, but instead you may have more than one barcode and
will want one output file per barcode (although not usually as
complicated as Kevin's setup). I need to learn more about how
Galaxy handles multiple outputs before commenting on that.

Peter



More information about the Biopython-dev mailing list