[Bioperl-l] Working with SeqWithQuality
Roy Chaudhuri
roy.chaudhuri at gmail.com
Thu Jun 19 15:44:37 UTC 2008
Hi Mick,
I think you want Bio::Tools::Alignment::Trim. There are lots of caveats
in the POD but as I recall it works fine.
Here is a small script I wrote to return contigs over a certain length
and a certain minimum quality (not exactly what you asked for but I'm
sure it could be adapted).
#!/usr/bin/perl
use warnings;
use strict;
use Bio::SeqIO;
use Bio::Tools::Alignment::Trim;
use Getopt::Long;
die "Usage: trimqual fasta_file qual_file -qual 20 -window 10 -min
500\n" unless $ARGV[1];
my $phred=20;
my $window=10;
my $min=500;
GetOptions ('phred|qual=i'=>\$phred,
'window=i'=>\$window,
'minimum=i'=>\$min) or die "Unrecognised option\n";
my $fasta=Bio::SeqIO->newFh(-file=>$ARGV[0], -format=>'fasta');
my $quality=Bio::SeqIO->newFh(-file=>$ARGV[1], -format=>'qual');
my $out=Bio::SeqIO->newFh(-format=>'fasta');
my $count=0;
while (<$fasta>) {
my $seq=$_;
my $qual=<$quality>;
my $trim=Bio::Tools::Alignment::Trim->new(-phreds=>$phred,
-windowsize=>$window);
my ($start,$end) = @{$trim->trim_singlet($seq->seq,
join(' ',@{$qual->qual}),
$seq->display_id,'singlet')};
next if $end==0;
my $trimmed_sequence=substr($seq->seq, $start, $end+1-$start);
my $length=length $trimmed_sequence;
print STDERR $seq->display_id, ": $start-$end [$length bp]";
if ($length<$min) {
print STDERR " - skipped\n";
next;
} else {
print STDERR "\n";
}
my $trimseq=Bio::Seq->new(-seq=>$trimmed_sequence,
-id=>$seq->display_id);
print $out $trimseq;
$count++;
}
print STDERR "\nFinished. $count contigs larger than $min bp. were
found.\n";
Cheers,
Roy.
--
Dr. Roy Chaudhuri
Department of Veterinary Medicine
University of Cambridge, U.K.
michael watson (IAH-C) wrote:
> Hi
>
> I have fasta files and separate quality fasta files. I understand all
> about constructing a SeqWithQuality object, that's the easy bit, but are
> there no functions for actually manipulating the sequence based on the
> quality values? For example, trimming the ends where a moving window
> average quality does not go above a certain value, or masking areas with
> low quality?
>
> Thanks
> Mick
>
> The information contained in this message may be confidential or legally
> privileged and is intended solely for the addressee. If you have
> received this message in error please delete it & notify the originator
> immediately.
> Unauthorised use, disclosure, copying or alteration of this message is
> forbidden & may be unlawful.
> The contents of this e-mail are the views of the sender and do not
> necessarily represent the views of the Institute.
> This email and associated attachments has been checked locally for
> viruses but we can accept no responsibility once it has left our
> systems.
> Communications on Institute computers are monitored to secure the
> effective operation of the systems and for other lawful purposes.
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
More information about the Bioperl-l
mailing list