[EMBOSS] display of long ensembl and vega identifiers in alignments
Peter Rice
pmr at ebi.ac.uk
Fri Aug 11 13:53:23 UTC 2006
Hans Rudolf Hotz wrote:
> A few months back, I played arround with the source code and changed one
> of the library files (ajalign.c). This now allows the display of up to 20
> characters, by using a new output format "pairln" for sequence alignment
> programs, like matcher or needle. This is in comparison to the default
> which displays only the first 6 characters, or "pair" which displays the
> first 13 characters, eg:
We can make the ID arbitrarily long for a "new" alignment format. We
will need formats similar to the existing matcher and needle outputs to
avoid breaking too many existing parsers (I remember when NCBI changed
the use of a blank at the start of each line of blast output and almost
all parsers had to change). The formats are easy to make (as you found
out) from the existing ones.
We need to decide what to do with the standard alignment formats that
have 6 characters in their definition (I assume this goes back to the
days of PIR database identifiers when FASTP was first written). As we
cannot fit many of the existing identifiers, we can make up unique
identifiers for these (truncate the identifier, and make the names
unique if they match).
Or, should we change the existing formats to allow longer IDs? What do
the authors of parsers think?
regards,
Peter
More information about the EMBOSS
mailing list