[EMBOSS] Compseq DNA/Protein sequence problem
Bernd Web
bernd.web at gmail.com
Mon Apr 23 21:28:15 UTC 2007
Hi Annette,
Your seq1 is incorrectly guessed to be a nucleotide sequence, since
you state it's protein. EMBOSS provides a boolean to state nucleotide
or protein nature of your sequence, see EMBOSS help:
"-sequence" associated qualifiers
-snucleotide1 boolean Sequence is nucleotide
-sprotein1 boolean Sequence is protein
regards,
bernd
On 4/23/07, Becher, Anette <anette.becher at agresearch.co.nz> wrote:
> Hi all,
>
> I believe I *may* have found a bug in compseq.
>
> I have been using compseq to calculate the frequency of amino acids in
> translated DNA sequences. I find that frequently compseq takes the amino
> acid sequence to be DNA (they are sequences with an unusual composition,
> but then I am looking for odd proteins). So instead of the expected
> output for all amino acids with most being zero, I often get output for
> A,C,G,T and 'other'. I cannot see an obvious pattern that would explain
> this behaviour, but maybe you can help.
>
> Command line:
>
> compseq -seq compseq_bug.in -word 1 -frame 1 -out compseq_bug.out
>
> An example input and output file are pasted in below - I can provide
> many more.
>
> It might help if the user could specify whether the input sequence is
> DNA or protein, rather than the program working it out somehow?
>
>
> Best wishes
>
>
> Anette
>
>
>
> Here is an example of the problem:
>
>
> >Seq1
> GSGGGGGSGGRGMGGWGGGRGSGVGGRGWGVG
>
>
> #
> # Output from 'compseq'
> #
> # Only words in frame 1 will be counted.
> # The Expected frequencies are calculated on the (false) assumption that
> every
> # word has equal frequency.
> #
> # The input sequences are:
> # Seq1
>
>
> Word size 1
> Total count 31
>
> #
> # Word Obs Count Obs Frequency Exp Frequency Obs/Exp
> Frequency
> #
> A 0 0.0000000 0.2500000 0.0000000
> C 0 0.0000000 0.2500000 0.0000000
> G 20 0.6451613 0.2500000 2.5806452
> T 0 0.0000000 0.2500000 0.0000000
>
> Other 11 0.3548387 0.0000000
> 10000000000.0000000
>
>
>
>
> Here is a similar sequence that works fine:
>
>
> >Seq2
> VGSEGGGGGRRGEGGGGGGRGGGGGRWEEGAG
>
>
>
> #
> # Output from 'compseq'
> #
> # Only words in frame 1 will be counted.
> # The Expected frequencies are calculated on the (false) assumption that
> every
> # word has equal frequency.
> #
> # The input sequences are:
> # Seq2
>
>
> Word size 1
> Total count 31
>
> #
> # Word Obs Count Obs Frequency Exp Frequency Obs/Exp
> Frequency
> #
> A 1 0.0322581 0.0476190 0.6774194
> C 0 0.0000000 0.0476190 0.0000000
> D 0 0.0000000 0.0476190 0.0000000
> E 4 0.1290323 0.0476190 2.7096774
> F 0 0.0000000 0.0476190 0.0000000
> G 20 0.6451613 0.0476190 13.5483871
> H 0 0.0000000 0.0476190 0.0000000
> I 0 0.0000000 0.0476190 0.0000000
> K 0 0.0000000 0.0476190 0.0000000
> L 0 0.0000000 0.0476190 0.0000000
> M 0 0.0000000 0.0476190 0.0000000
> N 0 0.0000000 0.0476190 0.0000000
> P 0 0.0000000 0.0476190 0.0000000
> Q 0 0.0000000 0.0476190 0.0000000
> R 4 0.1290323 0.0476190 2.7096774
> S 1 0.0322581 0.0476190 0.6774194
> T 0 0.0000000 0.0476190 0.0000000
> U 0 0.0000000 0.0476190 0.0000000
> V 0 0.0000000 0.0476190 0.0000000
> W 1 0.0322581 0.0476190 0.6774194
> Y 0 0.0000000 0.0476190 0.0000000
> =======================================================================
> Attention: The information contained in this message and/or attachments
> from AgResearch Limited is intended only for the persons or entities
> to which it is addressed and may contain confidential and/or privileged
> material. Any review, retransmission, dissemination or other use of, or
> taking of any action in reliance upon, this information by persons or
> entities other than the intended recipients is prohibited by AgResearch
> Limited. If you have received this message in error, please notify the
> sender immediately.
> =======================================================================
>
> _______________________________________________
> EMBOSS mailing list
> EMBOSS at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/emboss
>
More information about the EMBOSS
mailing list