[Bioperl-l] Bio::DB::Sam - finding clipped regions

Frank Schwach fs5 at sanger.ac.uk
Wed Mar 20 22:09:40 UTC 2013


Hi,

I need to report all the positions on a reference sequence where aligned 
reads in a bam file have been hard-clipped.
I am using Bio::DB::Sam and I know that I can use callbacks to traverse 
each individual read and get information about its alignment to each 
position on the reference as in

my $sam = Bio::DB::Sam->new(

   -bam  =>$bam,

   -fasta=>$fasta,

);

my $callback = sub {

   my ( $id, $pos, $p ) = @_;

   for my $pileup (@$p) {

     my $aln  = $pileup->alignment;

     my $cigar = $aln->cigar_str;

     # other stuff here

     # ...

  };

$sam->fast_pileup( $sequence_id, $callback );



But I don't see a straight-forward way (accepting that there may not be 
one of course) to ask "is the next base of the read hardclipped".
It's not that difficult to unravel the cigar string and I have the start 
of the alignment for the read, so I can follow the cigar string along to 
the current base on the reference to see what the alignment is there and 
what happens next.
I've just got that feeling that I'm missing something and there is 
probably a better and more efficient way of doing this, maybe with 
another tool? Using samtools mpileup I can get positions of SNPs and 
INDELS but I can't see a way of collecting the hard-clipped positions 
that I need or is that possible somehow (ok, not BioPerl, I know)

Thanks for your help guys!

Frank



-- 
 The Wellcome Trust Sanger Institute is operated by Genome Research 
 Limited, a charity registered in England with number 1021457 and a 
 company registered in England with number 2742969, whose registered 
 office is 215 Euston Road, London, NW1 2BE. 



More information about the Bioperl-l mailing list