[Biopython] Writing fasta+qual files and adjusting adapter clip positions in sff files

Martin Mokrejs mmokrejs at fold.natur.cuni.cz
Wed Apr 6 13:54:41 UTC 2011


Hi Peter,

Peter Cock wrote:
> On Tue, Apr 5, 2011 at 2:21 PM, Martin Mokrejs
> <mmokrejs at fold.natur.cuni.cz> wrote:
>> Hi Peter,
>>  I was looking into the Tutorial for a way to write fasta+qual files
>> but couldn't find it.
> 
> Maybe I don't understand your question, but Bio.SeqIO.write(...)
> can be used to save as FASTA or as QUAL (call it twice).

Ah, I forgot, right.

> 
>> I wanted to trim my objects assembled through
>> SeqIO.QualityIO.PairedFastaQualIterator.
>> Could _record[_start:_stop]  be used?
> 
> If I have understood your question correctly, yes. That function will
> parse a FASTA + QUAL pair and give you SeqRecord objects with
> sequence and quality. You can then slice each SeqRecord to apply
> trimming (underscores are not usual for variable names though).

OK, I just wasn't sure if the slicing works this way and was a bit lazy
to test myself to yield a new object with shorter sequences and qual values.

> 
>>  Anyways, I found that there is a way to convert sff files into re-trimmed
>> sff files which is even closer to my goal. Here is the help text from SffIO:
>>
>>        >>> from Bio import SeqIO
>>        >>> def filter_and_trim(records, primer):
>>        ...     for record in records:
>>        ...         if record.seq[record.annotations["clip_qual_left"]:].startswith(primer):
>>        ...             record.annotations["clip_qual_left"] += len(primer)
>>        ...             yield record
>>        >>> records = SeqIO.parse("Roche/E3MFGYR02_random_10_reads.sff", "sff")
>>        >>> count = SeqIO.write(filter_and_trim(records,"AAAGA"),
>>        ...                     "temp_filtered.sff", "sff")
>>        >>> print "Selected %i records" % count
>>        Selected 2 records
>>
> 
> That is showing how to edit the trim values in order to write out an
> updated SFF file.

Yes.

> 
>>
>> And this code from the Tutorial:
>>
>>>>> from Bio import SeqIO
>>>>> SeqIO.convert("E3MFGYR02_random_10_reads.sff", "sff-trim", "trimmed.fasta", "fasta")
>> 10
>>>>> SeqIO.convert("E3MFGYR02_random_10_reads.sff", "sff-trim", "trimmed.qual", "qual")
>> 10
>>>>> SeqIO.convert("E3MFGYR02_random_10_reads.sff", "sff-trim", "trimmed.fastq", "fastq")
>> 10
>>
>>  My questions is: could I provide readnames with clip_adapter_right directly to
>> SeqIO.convert()?
> 
> No, the Bio.SeqIO.convert(...) function is deliberately simple and inflexible.

Pity. Probably because the sff file can be indexed it would be fast if I provide
the function with a handle to (even in unsorted order):

GQF67IL01D9394  5-89
GQF67IL01AM9KN  5-87
GQF67IL01BIDWF  5-135
GQF67IL01D5PMS  5-97
GQF67IL01AONRB  5-60
GQF67IL01BNA85  5-0



> 
>> Well I will probably stick to 'sfffile -t trimpositions.txt myfile.sff'
>> anyways hoping it will be faster. ;)
> 
> If you have the trim positions in a suitable text file, and you want
> to apply them
> to an SFF file, and you are running on Linux so you can use sfffile, then that
> would work an may well be faster.

Yes, sfffile works for me while I probably see some bug in it (have to re-test
to be sure).

> 
> I'm a bit confused if you are trying to write out a new trimmed SFF file, a pair
> of trimmed FASTA and QUAL files, or even a trimmed FASTQ file. All of
> those are possible with Biopython.

I wanted either trimmed fasta+qual or trimmed sff (preferably) both with my _new_
trim points. From the above it is now clear for fasta+qual it can be done through
biopython while for sff alterations/creations I have to stick to sfffile (which
is fine for me).

Thanks,
Martin



More information about the Biopython mailing list