[Bioperl-l] blasting with multiple query sequences?

Tobias Thierer tthierer@informatik.uni-tuebingen.de
Fri, 4 Oct 2002 13:15:30 +0200 (MST)


Hi,

Bio::Tools::Run::StandAloneBlast::blastall

accepts a reference to an array of Bio::Seq objects.
If such a reference is passed, all the sequences in
the array will be blasted against the specified database.

However, although I verbosely browsed the documentation,
I could not find any hint on how to get the sequence
that belongs to a specific Subject or HSP. The BLAST
report gives me all HSPs grouped by subject, but it
seems that I have no way to actually find out which of
the sequences that I passed to blastall() actually caused
the hit. 

I'd really appreciate if someone could tell me if and
how this is possible. I need to blast multiple sequences
against a database, and get the high scoring matches for
each query sequence. The blast report, however, groups
the blast result by subject, where each subject consists
of a number of HSPs. $hsp->query is only a
Bio::SeqFeature::Similarity object, not the original 
sequence object. If I pass only one sequence to
blastall() at a time, the blast executable will be run
multiple times and thus the database will innecessarily
be loaded multiple times from the hard disk into the
memory.

Any help would really be appreciated!

Regards,

  Tobias


P.S.: Here is what I am currently doing:

  # ($sequences is a reference to an array of Bio::Seq
  # objects)

  my $report = $blaster->blastall($sequences);
  while(my $sbjct = $report->nextSbjct) {
    while (my $hsp = $sbjct->nextHSP) {

      # How can I find out which of the sequences in
      # @$sequences caused the hit?

      print "some sequence matched at ", $hsp->query->start,"->",
	 $hsp->query->end, "\n";
    }
  }