[Bioperl-l] use blast to extract similar sequences

Fri Sep 16 13:22:07 UTC 2011

That seems like a pretty straightforward thing to do; there isn't an all-in-one way of doing this, but that's a good thing (it's a separation of concerns).

1) Run and parse BLAST results and grab seqID and coordinates for each hit (or each HSP for each hit) (Bio::SearchIO)
2) Pull the right subsequence +/- 20bp using above from the indexed flat file of your reference (Bio::DB::Fasta)

You can get revcomped sequence from Bio::DB::Fasta directly by flipping coordinates:

  # raw sequence
  my $seq      = $db->seq('CHROMOSOME_I',4_000_000 => 4_100_000);
  my $revseq   = $db->seq('CHROMOSOME_I',4_100_000 => 4_000_000);

chris

On Sep 16, 2011, at 3:51 AM, Ross KK Leung wrote:

> I wonder whether bioperl has any built-in modules that extracts sequences based on blast results. For example, a short query sequence of length 1000 is to blast against a reference genome of 3M. The homologous sequence of 1000 +/-  20 is extracted. Why is +/- 20 needed? Because we can't guarantee there must have a good match. Frequent blast users may be well aware that then there can be coverage, split-up due to local alignments, etc and that's why I would like to know if anybody has already developed a module to handle this kind of problem. Thanks in advance!
> 
> 
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l