[Biopython] Pulling Alignment From PSI-Blast Output

Michiel de Hoon mjldehoon at yahoo.com
Tue Feb 8 12:05:22 UTC 2011


I am surprised that the multiple alignment is not in the XML at all. It can not be constructed from the information in the XML? Anyway, if it is in there, I would suggest to use Bio.Entrez to parse the XML instead of the parser in Bio.Blast. The Bio.Entrez parser will give you all the information in the XML; the parser in Bio.Blast is more polished but may not give you all the information present in the PSI-Blast output.

--Michiel.

--- On Tue, 2/8/11, Brett Bowman <bnbowman at gmail.com> wrote:

From: Brett Bowman <bnbowman at gmail.com>
Subject: Re: [Biopython] Pulling Alignment From PSI-Blast Output
To: "Michiel de Hoon" <mjldehoon at yahoo.com>
Cc: biopython at biopython.org
Date: Tuesday, February 8, 2011, 2:40 AM

I thought about that, but there doesn't appear to be any multiple-alignment data in the XML file - just pair-wise alignments of the query with each hit.  In addition, when I parse the output file with NCBIXML I get a Bio.Blast.Record.Blast object, instead of a Bio.Blast.Record.PSIBlast object.  The Biopython cookbook describes how to work with a PSIBlast object, but it doesn't really cover how to make one...



On Mon, Feb 7, 2011 at 5:20 PM, Michiel de Hoon <mjldehoon at yahoo.com> wrote:


One option you could try is to let PSI-Blast generate its output in XML and check if the information you need is present in the XML. If it is, you can parse the XML with the read() function in Bio.Entrez. You may find that Bio.Entrez needs an additional DTD file to be able to parse the PSI-Blast XML output (Bio.Entrez will tell you which one and where to store it). If so, please let us know, so we can include the required DTDs in the next release of Biopython.





--Michiel.



--- On Mon, 2/7/11, Brett Bowman <bnbowman at gmail.com> wrote:



> From: Brett Bowman <bnbowman at gmail.com>

> Subject: [Biopython] Pulling Alignment From PSI-Blast Output

> To: biopython at biopython.org

> Date: Monday, February 7, 2011, 5:30 PM

> I'm trying to use the PSI-Blast

> results from a series of proteins to detect

> distant homologues, using HMMs of various sorts. 

> Currently I'm pulling down

> the sequence IDs with PSI-Blast, downloading the full

> sequences from NCBI,

> then aligning everything with ClustalW or Muscle. 

> However this is eating up

> way more processor time than I have to spare, so I want to

> just pull the

> full multi-sequence alignment from the PSI-blast results if

> possible (OUTFMT

> option #3 or 4), for use in building the HMMs.  But it

> doesn't look like

> AlignIO has a module for reading the peculiar format that

> PSI-Blast

> generates...

>

> Has this been done before, or will I need to write my own

> parser?

>

> Brett Bowman

> Woelk Lab

> UCSD School of Medicine

> _______________________________________________

> Biopython mailing list  -  Biopython at lists.open-bio.org

> http://lists.open-bio.org/mailman/listinfo/biopython

>












      



More information about the Biopython mailing list