[EMBOSS] Match mass sequences for mass sequences

Peter Rice pmr at ebi.ac.uk
Tue Nov 20 08:58:21 UTC 2007

> Dear Sir
> I'm sorry for a perhaps naive question.
> I want to align sequences of 1000 pairs. For example, "A" file
> includes 1000 sequences and "B" file includes 1000 sequences and two
> file will be compared. I'd like to find certain sequence( X gene) of A
> file which have high sequence similarity with some sequence ( X' gene)
> in B file. Then, certain gene (Y) in "A" file will be matched with Y'
> gene which have high identity in B file. Finally, I want to get
> matched 1000 pairs and their identity score.  At one time, can I match
> mass sequences using Jemboss? How can I handle this problem?

In EMBOSS 5.0.0 the wordfinder program is designed to do this. It uses a 
word-based algorithm (n consecutive identical bases) and then aligns 
using a limited window size. One warning ... the alignment includes the 
original word match, which may (in low identity cases) not be the 
highest alignment score.

Wordfinder has additional options to select the matches you want.

Older EMBOSS releases had only supermatcher which is less sophisticated 
in selecting matches.

Hope that helps

Peter Rice

More information about the EMBOSS mailing list