[BioRuby] Blast with file as a query option?

Naohisa GOTO ngoto at gen-info.osaka-u.ac.jp
Tue Apr 7 04:03:03 UTC 2009


On Sun, 5 Apr 2009 14:13:37 -1000
Kevin English <kenglish at gmail.com> wrote:

> Hello,
>   I have to very large local fasta files that wish to blast against one
> another and parse the results in bio ruby. I'm wondering if there is a way
> to mimic the behavior of this blast command:
> blastall -p blastn -i Large_list_sequences_1.fasta -d Large_list_sequences_2
> where Large_list_sequences_2 is a formatted fasta db. My current
> implementation opens Large_list_sequences_1.fasta and goes through it
> sequence by sequence. It seems to run pretty slow. I'm wondering if I can in
> some way do the above blast command and loop through the results and get a
> performance gain.

To gain performance, adding options to BLAST is strongly recommended.
  -e  Expectation value (E) [Real]
    default = 10.0
  -v  Number of database sequences to show one-line descriptions for (V) [Integer]
    default = 500
  -b  Number of database sequence to show alignments for (B) [Integer]
    default = 250

Changing above to smaller values will reduce output report size
which means performance gain. 

Executing BLAST with multiple query sequences can also gain performance.
In addition, when you have query sequences in a local file, calling
blastall command directly without Bio::Blast may be good.

For example,

  require 'bio'
  require 'tempfile'

  command = %( blastall -p blastn -i Large_list_sequences_1.fasta
               -d Large_list_sequences_2 -e 0.0001 -b 20 -v 20 )
  tempfile = Tempfile.new('blastout')
  command = command + [ "-o", tempfile.path ]
  # After system(), error checks will be needed but skipped.
  ff = Bio::FlatFile.open(tempfile)
  ff.each do |report|
    # For example, prints query_def and target_def
    report.each do |hit|
      print report.query_def, "\t", hit.target_def, "\n"

> For any curious, my code is on github:
> http://github.com/kenglishhi/bioflexrails/tree/master
> The file that is doing the blasts is under app/model/biodatabase.rb.
> I'm trying to write a rails app uses biosql db and allows this biologist to
> organize his sequences. I'm very new to bioinformatics but have a lot
> experience with Ruby on Rails.
> Thanks in advance for you help.

In general, a BLAST search against a very large database takes
very long time, and using batch queueing system might be needed.


Naohisa Goto
ngoto at gen-info.osaka-u.ac.jp / ng at bioruby.org

More information about the BioRuby mailing list