[Bioperl-l] protal2dna and Bio::SimpleAlign

Mon Jan 24 10:41:44 EST 2005

On Jan 24, 2005, at 10:28 AM, Catherine Letondal wrote:

>
> On Jan 23, 2005, at 3:19 PM, Jason Stajich wrote:
>
>> I'm not familiar with the script.
>
> Web:
> http://bioweb.pasteur.fr/seqanal/interfaces/protal2dna.html
> Man:
> http://bioweb.pasteur.fr/docs/man/man/protal2dna.1.html
> Ftp:
> ftp://ftp.pasteur.fr/pub/GenSoft/unix/alignment/protal2dna
>
>>
>> Bio::Align::Utilities does protein to DNA mapping for an alignment 
>> with the aa_to_dna_aln function.
>
> The problem with this function aa_to_dna_aln is that  is restricted to 
> frame 1 and to the standard genetic code, right?
>        aa_to_dna_aln
>
This is an alignment mapper routine not an alignment routine itsself.  
So I think I was just being stupid and not looking at what protal2dna 
really was doing.

You provide it the protein multiple sequence alignment alignment and 
the coding sequence which gave rise to it.  It maps the gaps back in so 
you have a CDS alignment.  Very basic iterating through the alignment.

So it has to all be in-frame and already spliced, it should have been 
called aa_to_cds_aln.

The method is intended for getting ready to do Ka/Ks type stuff so that 
you have aligned  the sequences on codon boundaries and with knowledge 
about conservative aa replacements.

apologies for inciting confusion...
-j

>         Title   : aa_to_dna_aln
>         Usage   : my $dnaaln = aa_to_dna_aln($aa_aln, \%seqs);
>         Function: Will convert an AA alignment to DNA space given the
>                   corresponding DNA sequences.  Note that this method 
> expects
>                   the DNA sequences to be in frame +1 (GFF frame 0) as 
> it will
>                   start to project into coordinates starting at the 
> first base of
>                   the DNA sequence, if this alignment represents a 
> different
>                   frame for the cDNA you will need to edit the DNA 
> sequences
>                   to remove the 1st or 2nd bases (and revcom if things 
> should be).
>         Returns : Bio::Align::AlignI object
>         Args    : 2 arguments, the alignment and a hashref.
>                   Alignment is a Bio::Align::AlignI of amino acid 
> sequences.
>                   The hash reference should have keys which are
>                   the display_ids for the aa
>                   sequences in the alignment and the values are a
>                   Bio::PrimarySeqI object for the corresponding
>                   spliced cDNA sequence.
>
>
> The other problem when using tools offering several genetic code 
> (these sequences need a bacterial genetic code), is that the start 
> codon of this code is not the right one. These sequences need: GTG=M 
> (and not V).
>
>>
>> -jason
>> On Jan 22, 2005, at 4:07 PM, Maureen L Coleman wrote:
>>
>>> Hi.
>>> I'm trying to use the protal2dna script (downloaded from Pasteur 
>>> site) to convert protein alignments back to DNA alignments. It works 
>>> in some cases but not in others.  In the cases where it doesn't 
>>> work, it pulls out the same sequence twice instead of pulling out 
>>> seq1 and seq2 from my protein alignment.  Then when it tries to 
>>> match it up with the corresponding DNA sequence, it doesn't work - 
>>> it matches prot1 with dna1 (correctly) and prot1 with dna2 
>>> (incorrectly).
>>>
>>> I suspect this might be related to the name,start,end (nse) method 
>>> in Bio::SimpleAlign.  Any suggestions?
>>>
>>> Thanks,
>>> Maureen
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at portal.open-bio.org
>>> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>>
>> --
>> Jason Stajich
>> jason.stajich at duke.edu
>> http://www.duke.edu/~jes12/
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at portal.open-bio.org
>> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>
>
--
Jason Stajich
jason.stajich at duke.edu
http://www.duke.edu/~jes12/