[EMBOSS] display of long ensembl and vega identifiers in alignments

Peter Rice pmr at ebi.ac.uk
Fri Aug 11 13:53:23 UTC 2006

Hans Rudolf Hotz wrote:

> A few months back, I played arround with the source code and changed one
> of the library files (ajalign.c). This now allows the display of up to 20
> characters, by using a new output format "pairln" for sequence alignment
> programs, like matcher or needle. This is in comparison to the default
> which displays only the first 6 characters, or "pair" which displays the
> first 13 characters, eg:

We can make the ID arbitrarily long for a "new" alignment format. We 
will need formats similar to the existing matcher and needle outputs to 
avoid breaking too many existing parsers (I remember when NCBI changed 
the use of a blank at the start of each line of blast output and almost 
all parsers had to change). The formats are easy to make (as you found 
out) from the existing ones.

We need to decide what to do with the standard alignment formats that 
have 6 characters in their definition (I assume this goes back to the 
days of PIR database identifiers when FASTP was first written). As we 
cannot fit many of the existing identifiers, we can make up unique 
identifiers for these (truncate the identifier, and make the names 
unique if they match).

Or, should we change the existing formats to allow longer IDs? What do 
the authors of parsers think?



More information about the EMBOSS mailing list