[EMBOSS] Compseq DNA/Protein sequence problem

Becher, Anette anette.becher at agresearch.co.nz
Mon Apr 23 20:54:09 UTC 2007


Hi all,

I believe I *may* have found a bug in compseq.

I have been using compseq to calculate the frequency of amino acids in
translated DNA sequences. I find that frequently compseq takes the amino
acid sequence to be DNA (they are sequences with an unusual composition,
but then I am looking for odd proteins). So instead of the expected
output for all amino acids with most being zero, I often get output for
A,C,G,T and 'other'. I cannot see an obvious pattern that would explain
this behaviour, but maybe you can help.

Command line:

compseq -seq compseq_bug.in -word 1 -frame 1 -out compseq_bug.out

An example input and output file are pasted in below - I can provide
many more. 

It might help if the user could specify whether the input sequence is
DNA or protein, rather than the program working it out somehow?


Best wishes


Anette



Here is an example of the problem:


>Seq1
GSGGGGGSGGRGMGGWGGGRGSGVGGRGWGVG


#
# Output from 'compseq'
#
# Only words in frame 1 will be counted.
# The Expected frequencies are calculated on the (false) assumption that
every
# word has equal frequency.
#
# The input sequences are:
#       Seq1


Word size       1
Total count     31

#
# Word  Obs Count       Obs Frequency   Exp Frequency   Obs/Exp
Frequency
#
A       0               0.0000000       0.2500000       0.0000000
C       0               0.0000000       0.2500000       0.0000000
G       20              0.6451613       0.2500000       2.5806452
T       0               0.0000000       0.2500000       0.0000000

Other   11              0.3548387       0.0000000
10000000000.0000000




Here is a similar sequence that works fine:


>Seq2
VGSEGGGGGRRGEGGGGGGRGGGGGRWEEGAG



#
# Output from 'compseq'
#
# Only words in frame 1 will be counted.
# The Expected frequencies are calculated on the (false) assumption that
every
# word has equal frequency.
#
# The input sequences are:
#       Seq2


Word size       1
Total count     31

#
# Word  Obs Count       Obs Frequency   Exp Frequency   Obs/Exp
Frequency
#
A       1               0.0322581       0.0476190       0.6774194
C       0               0.0000000       0.0476190       0.0000000
D       0               0.0000000       0.0476190       0.0000000
E       4               0.1290323       0.0476190       2.7096774
F       0               0.0000000       0.0476190       0.0000000
G       20              0.6451613       0.0476190       13.5483871
H       0               0.0000000       0.0476190       0.0000000
I       0               0.0000000       0.0476190       0.0000000
K       0               0.0000000       0.0476190       0.0000000
L       0               0.0000000       0.0476190       0.0000000
M       0               0.0000000       0.0476190       0.0000000
N       0               0.0000000       0.0476190       0.0000000
P       0               0.0000000       0.0476190       0.0000000
Q       0               0.0000000       0.0476190       0.0000000
R       4               0.1290323       0.0476190       2.7096774
S       1               0.0322581       0.0476190       0.6774194
T       0               0.0000000       0.0476190       0.0000000
U       0               0.0000000       0.0476190       0.0000000
V       0               0.0000000       0.0476190       0.0000000
W       1               0.0322581       0.0476190       0.6774194
Y       0               0.0000000       0.0476190       0.0000000
=======================================================================
Attention: The information contained in this message and/or attachments
from AgResearch Limited is intended only for the persons or entities
to which it is addressed and may contain confidential and/or privileged
material. Any review, retransmission, dissemination or other use of, or
taking of any action in reliance upon, this information by persons or
entities other than the intended recipients is prohibited by AgResearch
Limited. If you have received this message in error, please notify the
sender immediately.
=======================================================================




More information about the EMBOSS mailing list