[EMBOSS] Many-to-many with needle and water

Peter Rice pmr at ebi.ac.uk
Mon Jul 6 11:19:30 UTC 2009


Peter C wrote:
> [I suppose you could do something a bit more cunning like start by
> caching the sequences as you read them read for re-use, but if the
> number of sequences crosses a threshold, stop caching and switch
> to re-reading the file for subsequence loops?]

Tricky. Rereading is not always possible - for example streamed standard
input as the data source.

> Perhaps others on the list can think of a better uses for this tool idea?

Let's see what response we get. One never knows until the question is
asked :-)

>> How large would the smaller input set be?
> 
> Hard to say without specific examples in mind. For some hand waving
> upper limits, for comparative genomics of bacteria using protein
> sequences, you might have a few thousand in each file. If I was trying
> this as part of an ad-hoc clustering algorithm (all-against-all), again
> maybe a few thousand sequences. In practice, a heuristic tool like
> supermatcher (or FASTA or BLAST) would probably be more sensible
> for large datasets like this due to the computational time.
> 
> I see needle and water as most useful on smaller datasets where
> the runtime cost of using an exact algorithm isn't too high. Therefore
> many-to-many needle/water searches may be best targeted at
> smaller sequence files. Things might be different with a multicore
> or GPU/OpenCL version of needle and water ;)

Multicore would be a possibility - at least on systems configured for
it. We are looking into picking up methods from the BioManyCores project.

> Anyway, unless someone else thinks a many-to-many version
> of needle and water would be useful, I wouldn't expect you to
> implement this. I'm just putting the idea forward for discussion.

Implementing is easy - we could simply send you the code to install
locally if nobody else needs it :-)

After all, it is only a minor modification to the existing applications.

regards,

Peter



More information about the EMBOSS mailing list