[Bioperl-l] How to input fasta and qual into Bio::Seq::Quality?

Thu Jul 24 15:09:08 UTC 2008

    I usually use both sequence and quality data. See below[1] if you 
don't know what I mean. As I mentioned int a previous thread, 
Bio::Seq::Quality seems great for this sort of work. But, near as I can 
tell, Bio::SeqIO does not directly create these objects.

Specifically, what I would like is for someone to tell me that my way of 
creating a Bio::Seq::Quality object is unnecessarily brute force. I 
would like this to be possible:

my $BSQ_in_obj   =Bio::SeqIO
        ->new( -file    =>$seq_infile,
                    -qfile  =>$qual_infile
              );
such that $BSQ_in_obj->next_seq would create a Bio::Seq::Quality object.

But I don't have a good overall sense of the guts of bioperl, so I'm not 
sure if this would create other problems.

It might not be clear what my problem is. So I've pasted code to show 
how I currently create a Bio::Seq::Quality object.

The code and data files are here: 
http://www.genomics.purdue.edu/~pmiguel/programs/BSQ_simple_trial.perl
http://www.genomics.purdue.edu/~pmiguel/programs/HEX1057J19scaffold.fasta
http://www.genomics.purdue.edu/~pmiguel/programs/HEX1057J19scaffold.fasta.qual

Or just the text:
#!/usr/bin/perl
use warnings;
use strict;
use Bio::SeqIO;
use Bio::Seq::Quality;

my ($seq_infile,$qual_infile)  =(scalar @ARGV == 1)
                         ?($ARGV[0]    ,"$ARGV[0].qual")
                         :@ARGV;

#Create input objects for both a seq (fasta) and qual file           
    my $in_seq_obj  =Bio::SeqIO
        ->new( -file    =>$seq_infile,
               -format  =>'fasta',
              );

    my $in_qual_obj  =Bio::SeqIO
        ->new( -file    =>$qual_infile,
               -format  =>'qual',
              );

while (1){
    #create actual objects for both a seq and its associated qual
    my $seq_obj =$in_seq_obj->next_seq() || last;
    my $qual_obj=$in_qual_obj->next_seq();

    #use seq and qual object methods feed info for new BSQ object
    my $bsq_obj =Bio::Seq::Quality->new
                 ( -seq =>  $seq_obj->seq(),
                   -qual=>  $qual_obj->qual(),
                 );

    #Do something with BSQ object to show off its powers
    print    $bsq_obj    ->subseq(10,100)       ,"\n"
            ,$bsq_obj    ->subqual_text(10,100) ,"\n";

}

It seems unwieldy though. Is there a better way?

Phillip

[1]    With finished sequence, quality should be uniformly high. But 
frequently we work with single pass, or other types of unfinished 
sequences. These type of sequences will have variable quality. That is, 
the SNP you just found may really be a sequencing ("measurement") 
error--a miscall.
    Pairing a fasta sequence file with a quality file, like the 
phred/phrap/consed package does, is now a common way to handle sequence 
data of variable quality. All modern sequencers (that I know of)  will 
generate quality scores for each base called.