[Bioperl-l] Bio::DB::Sam - finding clipped regions
Frank Schwach
fs5 at sanger.ac.uk
Wed Mar 20 22:09:40 UTC 2013
Hi,
I need to report all the positions on a reference sequence where aligned
reads in a bam file have been hard-clipped.
I am using Bio::DB::Sam and I know that I can use callbacks to traverse
each individual read and get information about its alignment to each
position on the reference as in
my $sam = Bio::DB::Sam->new(
-bam =>$bam,
-fasta=>$fasta,
);
my $callback = sub {
my ( $id, $pos, $p ) = @_;
for my $pileup (@$p) {
my $aln = $pileup->alignment;
my $cigar = $aln->cigar_str;
# other stuff here
# ...
};
$sam->fast_pileup( $sequence_id, $callback );
But I don't see a straight-forward way (accepting that there may not be
one of course) to ask "is the next base of the read hardclipped".
It's not that difficult to unravel the cigar string and I have the start
of the alignment for the read, so I can follow the cigar string along to
the current base on the reference to see what the alignment is there and
what happens next.
I've just got that feeling that I'm missing something and there is
probably a better and more efficient way of doing this, maybe with
another tool? Using samtools mpileup I can get positions of SNPs and
INDELS but I can't see a way of collecting the hard-clipped positions
that I need or is that possible somehow (ok, not BioPerl, I know)
Thanks for your help guys!
Frank
--
The Wellcome Trust Sanger Institute is operated by Genome Research
Limited, a charity registered in England with number 1021457 and a
company registered in England with number 2742969, whose registered
office is 215 Euston Road, London, NW1 2BE.
More information about the Bioperl-l
mailing list