[Bioperl-l] Next Gen Formats

Chris Fields cjfields at illinois.edu
Fri Mar 12 13:04:53 UTC 2010


On Mar 12, 2010, at 4:06 AM, Peter wrote:

> On Fri, Mar 12, 2010 at 3:35 AM, Chris Fields <cjfields at illinois.edu> wrote:
>> Ryan,
>> 
>> We would have to see example files to get an idea of how feasible it is.
>>  You could possibly use a Bio::SeqIO::fasta and a Bio::SeqIO::qual
>> stream, and interleave the two somehow.  However, BioPerl qual
>> scores are PHRED-based by default, and I'm not sure how color-space
>> data would work within that schematic.
>> 
>> chris
> 
> Chris,
> 
> I am under the (possibly mistaken) assumption that PHRED scores
> are used for SOLiD color space QUAL files - the key issue is each
> score corresponds to the color call in the color sequence.
> 
> Ignoring color-space for a moment, are there BioPerl examples
> of iterating over a pair of sequence-space FASTA and QUAL files?
> i.e. What you'd get if you had a FASTQ file to iterate over.
> 
> [I guess Ryan could just merge the color-space FASTA and
> QUAL into a color-space FASTQ file and iterate over that]
> 
> Peter

If they're PHRED scores then it should be fine, though we may need to work in a few color-space specific things.

Iterating over pairs is something that has popped up before.  For output, in the Bio::SeqIO::fastq module there is code for writing fasta/qual (to two separate streams), where I'm assuming one could do something like:

--------------------------------
my $in = Bio::SeqIO->new(-format => 'fastq', -file => 'foo.fastq');
my $out1 = Bio::SeqIO->new(-format => 'fastq', -file => '>foo.fasta'); 
my $out2 = Bio::SeqIO->new(-format => 'fastq', -file => '>foo.qual'); 

while (my $seq = $in->next_seq) {
    $out1->write_fasta($seq);
    $out2->write_fasta($seq);
}
--------------------------------

Note that all use the 'fastq' formatm instead of 'fasta' or 'qual'.  This should work for those as well, just haven't tried it myself (it's a bug otherwise).

I'm assuming for input it would be something like:

--------------------------------
my $in1 = Bio::SeqIO->new(-format => 'fasta', -file => 'foo.fasta');
my $in2 = Bio::SeqIO->new(-format => 'qual', -file => 'foo.qual'); 
my $out = Bio::SeqIO->new(-format => 'fastq', -file => '>foo.fastq'); 

# 'qual' parser joins the two streams
while (my $seq = $in2->next_seq($in1)) {
    $out->write_seq($seq);
}
--------------------------------

chris





More information about the Bioperl-l mailing list