interpreting ajtranslate.* in ajax library.

Gary Williams, Tel 01223 494522 gwilliam at hgmp.mrc.ac.uk
Mon Aug 6 15:24:13 UTC 2001


Bill Shui wrote:
> 
> Hi there,
>     I'm using EMBOSS as part of my honours thesis. What I am doing now is
>     breaking up all the library modules and reuse bits of them to get something
>     working.
> 
>     However, I am stuck with ajtranslate or the transeq program.
> 
>     In the file ajtranslate.c, the function ajTrnReadFile uses struct
>     AjSTrn to store the EGC data (well at least that's how I understood it)
>     correct me if I was wrong.

Correct.

>     Now, I don't understand why the variable GC and Starts in
>     AjSTrn are 15 by 15 by 15 matrices?

Each codon has 3 bases, so we use a 3-dimensional array to convert the
codons to residues.

The size of the array could be 4x4x4 for most purposes (there are four
bases: A, C, G, T) but sometimes ambiguity codes are used in positions
where the base is uncertain, e.g. 'M' codes for 'A' or 'C'. There are 15
bases if you include these ambiguity codes (including 'N' for the
completely unknown base). So to translate codons that have ambiguity
codes in them, you really need a 15x15x15 matrix.

Similarly, for the Start codons, although there are far fewer codons
that are Start codons and so this could probably have been done in a
more memory efficient way.

>     I also do not understand the meaning of initialisation of
>     the char arrays trnconv and trncomp.

To look up an element in the 15x15x15 codon to residue matrix, you need
to convert the bases to numbers. This is what trnconv[] is for. 
trncomp[] does the same thing, but gives you the number of the code for
the complement - this is used for translating the complement of the
sequence.

>     and why most of the arrays are 14?

Most of the arrays trnconv[] and trncomp[] hold '14' because this is the
code I am using for 'N' (unknown) - these are letters that do not
correspond to any recognised nucleotide code letter (i.e. they are not
one of: ACGTUMRWSYKVHDBN).

See:
http://www.chem.qmw.ac.uk/iupac/misc/naseq.html
for details of the ambiguity codes.

> your prompt reply to this is much appreciated as I really need this for my
> thesis and I'm on a tight schedule.

Is this soon enough for you?

> thanks in advance.
> 
> regards.
> 
> Bill
> --
> The mark of a good party is that you wake up the next morning
> wanting to change your name and start a new life in different
> city.
>                 -- Vance Bourjaily, "Esquire"
> ---------------------------------------------
> Bill Shui           Email: wshui at bigpond.net.au
>                            wshui at cse.unsw.edu.au
>                            touro at capoeirabrasil.com.au
>                            bill.shui at proteomesystems.com
> Bioinformatics Programmer

-- 
Gary Williams               Tel: +44 1223 494522  Fax: +44 1223 494512
mailto:G.Williams at hgmp.mrc.ac.uk            http://www.hgmp.mrc.ac.uk/
Bioinformatics,MRC HGMP Resource Centre,Hinxton,Cambridge, CB10 1SB,UK




More information about the emboss-dev mailing list