[BioPython] how to convert file full of BLAST runs into a FASTA file of sequences?

jchen at alumni.caltech.edu jchen at alumni.caltech.edu
Thu Apr 9 22:14:02 UTC 2009


Hi Peter,

> Do you just want the FASTA file to contain the matched region of the
> sequences in the database?  That information should be in the BLAST
> output - you'll need to remove any gap characters.
>
> If you want the full sequence of each matched target, that isn't in
> the database.  You'd have to take the reference number and look it up.
>  If you made the database yourself from a FASTA file, that should be
> easy.  If it was from NR/NT or another large database then maybe
> fetching the sequences from the NCBI would be easiest (try
> Bio.Entrez).

Yeah, I actually do want the full length FASTA sequences. I didn't think
about the fact that the BLAST output only contains (partial) match
regions. I have a FASTA file of the entire proteome for the organism we
are studying.

> Are you sure you are using the XML output?
>
> With the plain text output and BLAST v.2.2.18, Biopython can only cope
> with single query output.  The NCBI regularly change their plain text
> output, and we have more-or-less given up with the our plain text
> parser.  The NCBI themselves do not recommend parsing it - that is
> what the XML format was introduced for.
>

That's unfortunate there's no standard BLAST format. Yeah, I am trying to
parse the plain text BLAST output. I'm not familiar with the XML output -
I don't know how to have BLAST output in XML format.

My file contains a few hundred queries. I ended up writing a little script
that extracted the name of each query and each of its significant hits. I
will probably end up writing my own scripts for getting the FASTA
sequences for each of these hits from a FASTA proteome file.

> I can't offer any more advice without the error message, your OS (e.g.
> Windows XP), version of Python, version of Biopython and ideally a
> snippet of your code which is failing.

That's alright. It will be easier for me to write my own little scripts to
parse my BLAST output file. I was just hoping there was an easy, fast way
to do it with Biopython.

Thanks for your help!
-Jerry





More information about the Biopython mailing list