[Biopython] Blast DB keeps crashing nodes
Dilara Ally
dilara.ally at gmail.com
Sat Oct 15 21:55:21 UTC 2011
How many hits per sequence have you requested to get back - the default
on the blastall is 250? I did blast search on ~600,000 contigs but I
set up simultaneous jobs across 34 nodes. I used only the top 20 hits.
Each file had 1000 fasta formatted sequences and each node was given ~12
files. But I still had to do it in two parts to get all sequences
blasted. I waited until the first set finished to set up the second
blast job. The job finished in 2 days. Before I ran it on the cluster
I tested a single file to see how long and how much memory it took. The
cluster I used had 34 computing nodes, with 16-48 cores and 16-64GB of
memory.
Hope that helps.
On 10/15/11 1:59 PM, Willis, Jordan R wrote:
> Hello Biopython,
>
> I was wondering if anyone has worked extensively with the Blast Database locally.
>
> I am blasting millions of sequences using Biopython as my backend framework. I am using a high throughput computer cluster to blast each sequence. Rather than submit two million jobs, I have divided the fast files up into 50 or so.
>
> The problem I am facing is a memory issue. I'm not sure, but I think that the Database is cacheing itself and not clearing before the next sequence is queried. In that regard, the next job calls upon the database again, and so on….
>
> The memory builds up until it finally crashes the node. Has anyone dealt with this issue before?
>
> Thanks,
> Jordan
>
>
>
>
>
> _______________________________________________
> Biopython mailing list - Biopython at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython
>
More information about the Biopython
mailing list