[BioRuby] Wu-blast report parsing issue

Naohisa GOTO ngoto at gen-info.osaka-u.ac.jp
Thu Aug 9 16:15:45 UTC 2007


Hello,

I'm sorry it's too late.

It seems this error occurred in the line 29 of your xml file
         <Hit_def></Hit_def>
The content of the Hit_def is empty.
 
For sequences with no definition, NCBI BLAST outputs
          <Hit_def>No definition line found</Hit_def>
and the content of the Hit_def is not empty.

This means the output of WU-BLAST xml is sometimes
incompatible with the NCBI BLAST.

However, because this is very small difference,
I think this can be covered with BioRuby.


I can repeat the same error with the following data:

(saved as database.fst)
--------------------------------------------------------------
>lcl|EXAMPLE
AGACATAACCCAAACAGAATAACCTGAAAGAGACCCACGACCATGCAGGGGACCTGGATG
GTGCTGTTGGCACTGATATTGGGCACCTTCGGGGAGCTTGCTATGGCCTTACAGTGCTAC
ACCTGTGCGAATCCTGTGAGTGCATCCAACTGTGTCACCACCACCCACTGCCACATCAAT
GAAACCATGTGCAAGACTACGCTCTACTCCCTGGAGATTGTTTTCCCTTTCCTGGGGGAC
TCCACGGTGACCAAGTCCTGCGCCAGCAAGTGTGAGCCTTCGGATGTGGATGGCATTGGG
CAAACCCGGCCAGTGTCCTGCTGCAATTCTGACCTATGCAACGTGGATGGGGCACCCAGC
CTGGGCAGTCCTGGTGGCCTGCTCCTTGCCCTGGCACTTTTCTTGCTCTTGGGTGTCCTG
CTGTAAAGCCATGGCCATCTAGCTCCACTCCCTTGTCCCTGACATCCCAGTTCCCTAATG
CCTAGAAGAAATACAATGGCCATCTGC
--------------------------------------------------------------

(saved as query.fst)
--------------------------------------------------------------
>Contig1
AGACATAACCCAAACAGAATAACCTGAAAGAGACCCACGACCATGCAGGGGACCTGGATG
GTGCTGTTGGCACTGATATTGGGCACCTTCGGGGAGCTTGCTATGGCCTTACAGTGCTAC
ACCTGTGCGAATCCTGTGAGTGCATCCAACTGTGTCACCACCACCCACTGCCACATCAAT
GAAACCATGTGCAAGACTACGCTCTACTCCCTGGAGATTGTTTTCCCTTTCCTGGGGGAC
TCCACGGTGACCAAGTCCTGCGCCAGCAAGTGTGAGCCTTCGGATGTGGATGGCATTGGG
CAAACCCGGCCAGTGTCCTGCTGCAATTCTGACCTATGCAACGTGGATGGGGCACCCAGC
CTGGGCAGTCCTGGTGGCCTGCTCCTTGCCCTGGCACTTTTCTTGCTCTTGGGTGTCCTG
CTGTAAAGCCATGGCCATCTAGCTCCACTCCCTTGTCCCTGACATCCCAGTTCCCTAATG
CCTAGAAGAAATACAATGGCCATCTGC
--------------------------------------------------------------
The sequence of query.fst is completely the same as database.fst.
Only definition line is different.

commands for WU BLAST:
% xdformat -n database.fst
% wu-blastall -p blastn -i query.fst -d database.fst \
  -o wu-blastn.xml -e 1e-10 -m 7 -F F

commands for NCBI BLAST:
% formatdb -i database.fst -p F -o
% blastall -p blastn -i query.fst -d database.fst \
  -o ncbi-blastn.xml -e 1e-10 -m 7 -F F

Report of WU BLAST:
          <Hit_num>1</Hit_num>
          <Hit_id>lcl|EXAMPLE</Hit_id>
          <Hit_def></Hit_def>
          <Hit_accession>EXAMPLE</Hit_accession>
          <Hit_len>507</Hit_len>

Report of NCBI BLAST:
          <Hit_num>1</Hit_num>
          <Hit_id>lcl|EXAMPLE</Hit_id>
          <Hit_def>No definition line found</Hit_def>
          <Hit_accession>EXAMPLE</Hit_accession>
          <Hit_len>507</Hit_len>

The Hit_def line of WU-BLAST is incompatible with NCBI BLAST
for sequences with no definitions.

The versions of WU BLAST and NCBI BLAST were:
2.0MP-WashU [04-May-2006] [linux26-i786-ILP32F64 2006-05-09T12:19:58]
blastn 2.2.15 [Oct-15-2006]


> I've tried feeding my script normal (-m0) wublast output too. It  
> doesn't crash - but @reportsArray.length == 0).

Bio::Blast.reports can only be used for XML output.
For normal format,

49:		@reportsArray = Bio::FlatFile.new(nil, @file).to_a

would work.

Thank you,

Naohisa Goto
ngoto at gen-info.osaka-u.ac.jp / ng at bioruby.org


On Tue, 24 Apr 2007 21:03:56 +0200
Yannick Wurm <Yannick.Wurm at unil.ch> wrote:

> Hello,
> 
> I've generated a blast report using wu-blastall with -m7 to get xml  
> output.
> It should be easy to get this into ruby, but I'm having a hard time.
> 
> Here's the error I get:
> #~/ruby/dotGraphOfStrongHits.rb simple.xml simple.xml.dot 1.0e-5
> /sw/lib/ruby/site_ruby/1.8/bio/appl/blast/xmlparser.rb:158:in  
> `clone': can't clone NilClass (TypeError)
>          from /sw/lib/ruby/site_ruby/1.8/bio/appl/blast/xmlparser.rb: 
> 158:in `xmlparser_parse_hit'
>          from /sw/lib/ruby/site_ruby/1.8/bio/appl/blast/xmlparser.rb: 
> 72:in `xmlparser_parse'
>          from /sw/lib/ruby/site_ruby/1.8/bio/appl/blast/xmlparser.rb: 
> 41:in `xmlparser_parse'
>          from /sw/lib/ruby/site_ruby/1.8/bio/appl/blast/report.rb: 
> 66:in `auto_parse'
>          from /sw/lib/ruby/site_ruby/1.8/bio/appl/blast/report.rb: 
> 89:in `initialize'
>          from /sw/lib/ruby/site_ruby/1.8/bio/appl/blast.rb:115:in  
> `reports'
>          from /sw/lib/ruby/site_ruby/1.8/bio/appl/blast.rb:109:in  
> `reports'
>          from /Users/yannickwurm/ruby/wublastReportParser.rb:49:in  
> `loadBlastReport'
>          from /Users/yannickwurm/ruby/dotGraphOfStrongHits.rb:30:in  
> `parseFile'
>          from /Users/yannickwurm/ruby/dotGraphOfStrongHits.rb:61
> 
> The corresponding lines of wublastReportParser.rb are:
> 48:		@file = File.open(@blast_report, IO::RDONLY)
> 49:		@reportsArray = Bio::Blast.reports(@file)
> 
> 
> Does wublast not respect the standard blast xml output?
> I've tried feeding my script normal (-m0) wublast output too. It  
> doesn't crash - but @reportsArray.length == 0).
> 
> My xml-ed blast report is here:
> http://wwwpeople.unil.ch/yannick.wurm/simple.xml
> 
> 
> What am I doing wrong? Do you have ideas how to solve this issue?
> 
> 
> My version info:
> 	wu-blastall 2.2.6
> 	ruby 1.8.4 (2005-12-24) [powerpc-darwin]
> 	bio.rb,v 1.84 2007/04/05
> 
> Thanks in advance for any pointers!
> yannick
> 
> --------------------------------------------
>           yannick . wurm @ unil . ch
> Ant Genomics, Ecology & Evolution @ Lausanne
>    http://www.unil.ch/dee/page28685_fr.html
> 
> 
> _______________________________________________
> BioRuby mailing list
> BioRuby at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioruby




More information about the BioRuby mailing list