[EMBOSS] Many-to-many with needle and water

Peter Rice pmr at ebi.ac.uk
Mon Jul 6 10:35:12 UTC 2009


Peter Cock or biopython wrote:
> Hi Peter R. et al,
> 
> I gather EMBOSS is looking for feedback for new applications (given
> the recent funding from the BBSRC - congratulations again). How about
> suggestions for extensions to existing EMBOSS applications?
> 
> I've used bits of EMBOSS for several years now (thank you!). Something
> I have sometimes wanted to do is a many-to-many pairwise sequence
> alignment with the EMBOSS tools needle and water.
> 
> Right now, needle and water take two files (here referred to as A and
> B), file A has just one sequence, and file B can have one or more
> sequences. I'd like to be able to supply two files both with multiple
> entries, and have needle/water do pairwise alignments between all the
> sequences in A against all the sequences in B. This might be useful
> for finding reciprocal best hits in comparative genomics (as an slower
> but exact alternative to FASTA or BLAST).

The application is easy to add (after the release)

The usual problem with all-against-all is that it involves loading one
of the inputs as a sequence set entirely in memory - to avoid reading
one input many times over.

We have an application supermatcher which does this - the first sequence
is streamed through, the second is a sequence set loaded into memory. It
uses work matching to find seed alignments then runs a limited alignment
around the hits.

superwater would be a possible name (or superneedle).

How popular would such a program be?

How large would the smaller input set be?

regards,

Peter




More information about the EMBOSS mailing list