[BioPython] Parsing BLAST

Peter biopython at maubp.freeserve.co.uk
Wed Aug 27 21:44:47 UTC 2008


> I do have an additional request: once I parse these out, I only get 50
> entries. however, if I do the same search online, I get 138... what
> accounts for the difference?
>
> This is my code:
>
> from Bio import SeqIO
> from Bio.Blast import NCBIWWW
> from Bio.Blast import NCBIXML
>
> record = SeqIO.read(open("protein_fasta.txt"), format="fasta")
> result_handle = NCBIWWW.qblast("blastp", "nr", record.seq.tostring())
>
> blast_records = NCBIXML.parse(result_handle)
> blast_record = blast_records.next()
>
> for x in blast_record.alignments:
>    print x.title, x.accession, x.length
>
> acc_list = []
> for x in blast_record.alignments:
>    acc_list.append(x.accession)
>
> len(acc_list) tells me 50...
>
> Is there a default limit somewhere?

Yes there is.  At the python prompt (or in IDLE), try:

>>> from Bio.Blast import NCBIWWW
>>> help(NCBIWWW.qblast)

(You can try this trick on all python objects and functions - although
not everything as any help text defined)

I think you probably want to override hitlist_size=50, so try changing:

result_handle = NCBIWWW.qblast("blastp", "nr", record.seq.tostring())

to:

result_handle = NCBIWWW.qblast("blastp", "nr", record.seq.tostring(),
hitlist_size=200)

Peter



More information about the Biopython mailing list