[EMBOSS] display of long ensembl and vega identifiers in alignments

Hans Rudolf Hotz hrh at sanger.ac.uk
Fri Aug 11 12:54:09 UTC 2006


Hi

ensembl and vega identifiers are very long, and are therefore cut when
used in alignment programs like matcher, eg:


cbi1b[hrh]59: matcher pep1 pep2 stdout
Finds the best local alignments between two sequences
########################################
# Program: matcher
# Rundate: Fri Aug 11 2006 13:45:51
# Align_format: markx0
# Report_file: stdout
########################################

#=======================================
#
# Aligned_sequences: 2
# 1: OTTHUMT00000072262
# 2: ENST00000216277
# Matrix: EBLOSUM62
# Gap_penalty: 14
# Extend_penalty: 4
#
# Length: 745
# Identity:     745/745 (100.0%)
# Similarity:   745/745 (100.0%)
# Gaps:           0/745 ( 0.0%)
# Score: 3818
#
#
#=======================================

               10        20        30        40        50
OTTHUM MPFPVTTQGSQQTQPPQKHYGITSPISLAAPKETDCVLTQKLIETLKPFG
       ::::::::::::::::::::::::::::::::::::::::::::::::::
ENST00 MPFPVTTQGSQQTQPPQKHYGITSPISLAAPKETDCVLTQKLIETLKPFG
               10        20        30        40        50


A few months back, I played arround with the source code and changed one
of the library files (ajalign.c). This now allows the display of up to 20
characters, by using a new output format "pairln" for sequence alignment
programs, like matcher or needle. This is in comparison to the default
which displays only the first 6 characters, or "pair" which displays the
first 13 characters, eg:


cbi1b[hrh]65: matcher pep1 pep2 stdout -aformat pairln
Finds the best local alignments between two sequences
########################################
# Program: matcher
# Rundate: Fri Aug 11 2006 13:49:41
# Align_format: pairln
# Report_file: stdout
########################################

#=======================================
#
# Aligned_sequences: 2
# 1: OTTHUMT00000072262
# 2: ENST00000216277
# Matrix: EBLOSUM62
# Gap_penalty: 14
# Extend_penalty: 4
#
# Length: 745
# Identity:     745/745 (100.0%)
# Similarity:   745/745 (100.0%)
# Gaps:           0/745 ( 0.0%)
# Score: 3818
#
#
#=======================================

OTTHUMT00000072262        1
MPFPVTTQGSQQTQPPQKHYGITSPISLAAPKETDCVLTQKLIETLKPFG     50

||||||||||||||||||||||||||||||||||||||||||||||||||
ENST00000216277           1
MPFPVTTQGSQQTQPPQKHYGITSPISLAAPKETDCVLTQKLIETLKPFG     50



Any chance something like this could make it into the distributed code?


Thanks, Hans






More information about the EMBOSS mailing list