[emboss-dev] Enquiry about emboss needle program customizin​g

Peter Rice pmr at ebi.ac.uk
Tue Jul 19 05:50:00 UTC 2011


On 19/07/2011 03:00, Tae-Kyung Kim wrote:
> I am now working for Korea Bioinformation Center, and particularly
> integrating Korean Bio-resources as National infrastructure.
>
> Now, I am urgently customizing global alignment program needle

I think we can help. I believe EMBOSS can already do what you need.

> First, I would like to get standard input sequence parameter instead of
> file name like this.
>
> ./needle -asequence ATGCATATAAA -bsequence ATAGAATAAA -gapopen 10
> -gapextend 0.5 -stdout -auto

All EMBOSS programs can do this. EMBOSS has a file format "asis" which 
defines the file name as the sequence, so your command line becomes:

./needle -asequence asis::ATGCATATAAA -bsequence asis::ATAGAATAAA
      -gapopen 10 -gapextend 0.5 -stdout -auto

For long sequences your system needs to allow long command lines.

EMBOSS has no restriction on the length of the asis:: sequence but 
sometimes the system gives an error message or truncates the command 
line. For long sequences you can, of course, save the sequence to a file 
and use the filename as input.

> Second, I would like to get real identity, which means the number
 > of '|' in needle result.
 >
>                   ATAAAAAA
>                    | | | |    | | |
>                   ATAATAAA
>     ---------------------------------------
>               Real Identity = 7

Needle reports this information ... if you look in the header of the 
output you will find an Indentity: line which is the number of positions 
with a '|' in the output.

#=======================================
#
# Aligned_sequences: 2
# 1: asis
# 2: asis
# Matrix: EDNAFULL
# Gap_penalty: 10.0
# Extend_penalty: 0.5
#
# Length: 14
# Identity:       7/14 (50.0%)
# Similarity:     7/14 (50.0%)
# Gaps:           7/14 (50.0%)
# Score: 24.0
#
#
#=======================================

asis               1 ATGCAT---ATAAA     11
                          ||   |||||
asis               1 ----ATAGAATAAA     10


You can also select an alternative output format with the -aformat 
qualifier (alignment format). Most of the alignment formats also include 
this header.

A list of alignment formats can be found on our website at 
http://emboss.open-bio.org/html/use/ch05s04.html

(This is chapter 5.4 of the new EMBOSS User's Guide book)

Hope this helps.

Peter Rice
EMBOSS Team



More information about the emboss-dev mailing list