[Biopython-dev] 7/5 biopython Questions - BioStar
Feed My Inbox
updates at feedmyinbox.com
Tue Jul 5 10:56:46 UTC 2011
// GenBank to Fasta failing with CONTIG fields
// July 5, 2011 at 6:31 AM
I used to generate FASTA out of my GenBank source files using a simple conversion script:
import sys, signal
from Bio import SeqIO
def wrap( text, width=80 ):
for i in xrange( 0, len( text ), width ):
if name == "main":
status = progress()
for record in SeqIO.parse( sys.stdin, "genbank"):
gi = record.annotations["gi"]
gi = None
accession = record.id
desc = record.description
seq = record.seq
locus = record.name
print ">gi|%s|emb|%s|%s| %s" % (gi, accession, locus, desc)
for block in wrap( seq ):
When I changed the sequence files to newer versions some of the resulting FASTA file sequences were just filled with Ns. After closer inspection of the GenBank source files, it turns out that they have replaced the ORIGIN block
with a CONTIG block, something like
Is there a way to resolve this using BioPython?
I was working with BioPython 1.52 and 1.57 (latest).
Thanks for your suggestions.
// Parsing BLAST output BioPython Error
// July 5, 2011 at 2:25 AM
I have the following code
print "Running BLAST .........."
cmd=subprocess.Popen("blastp -db nr -query repeat.txt -out out.faa -evalue 0.001 -gapopen 11 -gapextend 1 -matrix BLOSUM62 -remote -outfmt 5",shell=True)
blast_records = NCBIXML.parse(f1)
save_file = open("my_fasta_seq.fasta", 'w')
for blast_record in blast_records[:10]:
for alignment in blast_record.alignments:
for hsp in alignment.hsps:
save_file.write('>%s\n' % (alignment.hseq,))
for record in SeqIO.parse(f2,"fasta"):
I get the error on TypeError: for blast_record in blast_records[:10]: saying 'generator' object is not subscriptable.
I am looking to get top 10 blast hits (sequences)
// Getting top 10 sequences of BLAST results Bio Python
// July 5, 2011 at 12:29 AM
I want to get top 10 sequences of BLAST results (just the sequences, no alignment or score or e-value etc). I am inputting a text file containing 5 fasta file. So my output should be top 10 blast hits of each fasta file.. therefore my output file will have 50 sequences.
I am reading each of my input fasta file through Bio.SeqIO, writing it as temp.faa and then passing it to command line BLAST through subprocess as
blastp -db nr -query temp.faa -out out.faa -evalue 0.001 -gapopen 11 -gapextend 1 -matrix BLOSUM62 -remote -outfmt 2
the output has lots of other information. Should I parse this output now or there's a better way.
P.S XML might be the way, but I didn't find a relavant NCBIXML parser syntax
This email was carefully delivered by FeedMyInbox.com.
PO Box 682532 Franklin, TN 37068
More information about the Biopython-dev