[EMBOSS] Compseq DNA/Protein sequence problem
Bernd Web
bernd.web at gmail.com
Thu May 17 10:32:38 UTC 2007
Hi,
Regarding compseq I wonder how to count words in reading frame 0 only.
The frame values can be 0,1,2 for words of length 2.
I use "AGAGAG" as sequence and 1 as frame. This results in 2 times GA.
Using frame 2 results in two times AG.
But how to get a count of 3 times AG only? Frame zero returns a count
of 3 for AG, but also a count of 2 for GA.
I used emboss version 4.1.0 over the web with EMBOSS explorer.
regards,
bernd
On 4/23/07, Bernd Web <bernd.web at gmail.com> wrote:
> Hi Annette,
>
> Your seq1 is incorrectly guessed to be a nucleotide sequence, since
> you state it's protein. EMBOSS provides a boolean to state nucleotide
> or protein nature of your sequence, see EMBOSS help:
>
> "-sequence" associated qualifiers
> -snucleotide1 boolean Sequence is nucleotide
> -sprotein1 boolean Sequence is protein
>
> regards,
> bernd
>
> On 4/23/07, Becher, Anette <anette.becher at agresearch.co.nz> wrote:
> > Hi all,
> >
> > I believe I *may* have found a bug in compseq.
> >
> > I have been using compseq to calculate the frequency of amino acids in
> > translated DNA sequences. I find that frequently compseq takes the amino
> > acid sequence to be DNA (they are sequences with an unusual composition,
> > but then I am looking for odd proteins). So instead of the expected
> > output for all amino acids with most being zero, I often get output for
> > A,C,G,T and 'other'. I cannot see an obvious pattern that would explain
> > this behaviour, but maybe you can help.
> >
> > Command line:
> >
> > compseq -seq compseq_bug.in -word 1 -frame 1 -out compseq_bug.out
> >
> > An example input and output file are pasted in below - I can provide
> > many more.
> >
> > It might help if the user could specify whether the input sequence is
> > DNA or protein, rather than the program working it out somehow?
> >
> >
> > Best wishes
> >
> >
> > Anette
> >
> >
> >
> > Here is an example of the problem:
> >
> >
> > >Seq1
> > GSGGGGGSGGRGMGGWGGGRGSGVGGRGWGVG
> >
> >
> > #
> > # Output from 'compseq'
> > #
> > # Only words in frame 1 will be counted.
> > # The Expected frequencies are calculated on the (false) assumption that
> > every
> > # word has equal frequency.
> > #
> > # The input sequences are:
> > # Seq1
> >
> >
> > Word size 1
> > Total count 31
> >
> > #
> > # Word Obs Count Obs Frequency Exp Frequency Obs/Exp
> > Frequency
> > #
> > A 0 0.0000000 0.2500000 0.0000000
> > C 0 0.0000000 0.2500000 0.0000000
> > G 20 0.6451613 0.2500000 2.5806452
> > T 0 0.0000000 0.2500000 0.0000000
> >
> > Other 11 0.3548387 0.0000000
> > 10000000000.0000000
> >
> >
> >
> >
> > Here is a similar sequence that works fine:
> >
> >
> > >Seq2
> > VGSEGGGGGRRGEGGGGGGRGGGGGRWEEGAG
> >
> >
> >
> > #
> > # Output from 'compseq'
> > #
> > # Only words in frame 1 will be counted.
> > # The Expected frequencies are calculated on the (false) assumption that
> > every
> > # word has equal frequency.
> > #
> > # The input sequences are:
> > # Seq2
> >
> >
> > Word size 1
> > Total count 31
> >
> > #
> > # Word Obs Count Obs Frequency Exp Frequency Obs/Exp
> > Frequency
> > #
> > A 1 0.0322581 0.0476190 0.6774194
> > C 0 0.0000000 0.0476190 0.0000000
> > D 0 0.0000000 0.0476190 0.0000000
> > E 4 0.1290323 0.0476190 2.7096774
> > F 0 0.0000000 0.0476190 0.0000000
> > G 20 0.6451613 0.0476190 13.5483871
> > H 0 0.0000000 0.0476190 0.0000000
> > I 0 0.0000000 0.0476190 0.0000000
> > K 0 0.0000000 0.0476190 0.0000000
> > L 0 0.0000000 0.0476190 0.0000000
> > M 0 0.0000000 0.0476190 0.0000000
> > N 0 0.0000000 0.0476190 0.0000000
> > P 0 0.0000000 0.0476190 0.0000000
> > Q 0 0.0000000 0.0476190 0.0000000
> > R 4 0.1290323 0.0476190 2.7096774
> > S 1 0.0322581 0.0476190 0.6774194
> > T 0 0.0000000 0.0476190 0.0000000
> > U 0 0.0000000 0.0476190 0.0000000
> > V 0 0.0000000 0.0476190 0.0000000
> > W 1 0.0322581 0.0476190 0.6774194
> > Y 0 0.0000000 0.0476190 0.0000000
> > =======================================================================
> > Attention: The information contained in this message and/or attachments
> > from AgResearch Limited is intended only for the persons or entities
> > to which it is addressed and may contain confidential and/or privileged
> > material. Any review, retransmission, dissemination or other use of, or
> > taking of any action in reliance upon, this information by persons or
> > entities other than the intended recipients is prohibited by AgResearch
> > Limited. If you have received this message in error, please notify the
> > sender immediately.
> > =======================================================================
> >
> > _______________________________________________
> > EMBOSS mailing list
> > EMBOSS at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/emboss
> >
>
More information about the EMBOSS
mailing list