[Bioperl-l] How to input fasta and qual into Bio::Seq::Quality?
Phillip San Miguel
pmiguel at purdue.edu
Thu Jul 24 15:09:08 UTC 2008
I usually use both sequence and quality data. See below[1] if you
don't know what I mean. As I mentioned int a previous thread,
Bio::Seq::Quality seems great for this sort of work. But, near as I can
tell, Bio::SeqIO does not directly create these objects.
Specifically, what I would like is for someone to tell me that my way of
creating a Bio::Seq::Quality object is unnecessarily brute force. I
would like this to be possible:
my $BSQ_in_obj =Bio::SeqIO
->new( -file =>$seq_infile,
-qfile =>$qual_infile
);
such that $BSQ_in_obj->next_seq would create a Bio::Seq::Quality object.
But I don't have a good overall sense of the guts of bioperl, so I'm not
sure if this would create other problems.
It might not be clear what my problem is. So I've pasted code to show
how I currently create a Bio::Seq::Quality object.
The code and data files are here:
http://www.genomics.purdue.edu/~pmiguel/programs/BSQ_simple_trial.perl
http://www.genomics.purdue.edu/~pmiguel/programs/HEX1057J19scaffold.fasta
http://www.genomics.purdue.edu/~pmiguel/programs/HEX1057J19scaffold.fasta.qual
Or just the text:
#!/usr/bin/perl
use warnings;
use strict;
use Bio::SeqIO;
use Bio::Seq::Quality;
my ($seq_infile,$qual_infile) =(scalar @ARGV == 1)
?($ARGV[0] ,"$ARGV[0].qual")
:@ARGV;
#Create input objects for both a seq (fasta) and qual file
my $in_seq_obj =Bio::SeqIO
->new( -file =>$seq_infile,
-format =>'fasta',
);
my $in_qual_obj =Bio::SeqIO
->new( -file =>$qual_infile,
-format =>'qual',
);
while (1){
#create actual objects for both a seq and its associated qual
my $seq_obj =$in_seq_obj->next_seq() || last;
my $qual_obj=$in_qual_obj->next_seq();
#use seq and qual object methods feed info for new BSQ object
my $bsq_obj =Bio::Seq::Quality->new
( -seq => $seq_obj->seq(),
-qual=> $qual_obj->qual(),
);
#Do something with BSQ object to show off its powers
print $bsq_obj ->subseq(10,100) ,"\n"
,$bsq_obj ->subqual_text(10,100) ,"\n";
}
It seems unwieldy though. Is there a better way?
Phillip
[1] With finished sequence, quality should be uniformly high. But
frequently we work with single pass, or other types of unfinished
sequences. These type of sequences will have variable quality. That is,
the SNP you just found may really be a sequencing ("measurement")
error--a miscall.
Pairing a fasta sequence file with a quality file, like the
phred/phrap/consed package does, is now a common way to handle sequence
data of variable quality. All modern sequencers (that I know of) will
generate quality scores for each base called.
More information about the Bioperl-l
mailing list