[Bioperl-l] Help parsing PSI-BLAST XML reports

Torsten Seemann torsten.seemann at infotech.monash.edu.au
Thu Apr 5 01:34:17 UTC 2007


Dear all,

I have been migrating all our BLAST infrastructure to use the XML
output mode, the "blastpgp -m 7" option, referred to 'blastxml' format
in Bioperl. I had never used SearchIO to parse a PSI-BLAST XML report
before, and encountered some issues I hope you can help me with:

1. When loading with Bio::SearchIO(-format=>'blastxml') I get back a
Bio::Search::Result::GenericResult object. This means I can not use
the PSI-BLAST functions like iterations() and psiblast() provided by
Bio::Search::Result::BlastResult. I'm guessing this is because the the
XML output reports itself as a plain BLASTP output:
<BlastOutput_program>blastp</BlastOutput_program>

How do I determine if it is a PSI-BLAST report?

2. Usually a PSI-BLAST report has multiple Iterations. The XML output
has <Iteration> tags but it took me a while to figure out that these
get mapped to Bio::SearchIO::Result objects accessible via
Bio::SearchIO->next_result().

Is this the proper way to process the iterations?

3. I also notice that only the first result (iteration) has the
query_name set. Subsequent ones are empty:
RESULT 1 Bio::Search::Result::GenericResult, algorithm= BLASTP,
query=MyProtein , db=uniprot_sprot
RESULT 2 Bio::Search::Result::GenericResult, algorithm= BLASTP, query=
, db=uniprot_sprot

Is this a bug or expected?

I'm guessing a lot of these problems are simply due to limitations of
the NCBI BLAST XML DTD?

--Torsten



More information about the Bioperl-l mailing list