[Bioperl-l] Help parsing PSI-BLAST XML reports
    Torsten Seemann 
    torsten.seemann at infotech.monash.edu.au
       
    Thu Apr  5 01:34:17 UTC 2007
    
    
  
Dear all,
I have been migrating all our BLAST infrastructure to use the XML
output mode, the "blastpgp -m 7" option, referred to 'blastxml' format
in Bioperl. I had never used SearchIO to parse a PSI-BLAST XML report
before, and encountered some issues I hope you can help me with:
1. When loading with Bio::SearchIO(-format=>'blastxml') I get back a
Bio::Search::Result::GenericResult object. This means I can not use
the PSI-BLAST functions like iterations() and psiblast() provided by
Bio::Search::Result::BlastResult. I'm guessing this is because the the
XML output reports itself as a plain BLASTP output:
<BlastOutput_program>blastp</BlastOutput_program>
How do I determine if it is a PSI-BLAST report?
2. Usually a PSI-BLAST report has multiple Iterations. The XML output
has <Iteration> tags but it took me a while to figure out that these
get mapped to Bio::SearchIO::Result objects accessible via
Bio::SearchIO->next_result().
Is this the proper way to process the iterations?
3. I also notice that only the first result (iteration) has the
query_name set. Subsequent ones are empty:
RESULT 1 Bio::Search::Result::GenericResult, algorithm= BLASTP,
query=MyProtein , db=uniprot_sprot
RESULT 2 Bio::Search::Result::GenericResult, algorithm= BLASTP, query=
, db=uniprot_sprot
Is this a bug or expected?
I'm guessing a lot of these problems are simply due to limitations of
the NCBI BLAST XML DTD?
--Torsten
    
    
More information about the Bioperl-l
mailing list