[Bioperl-l] protal2dna and Bio::SimpleAlign

Catherine Letondal letondal at pasteur.fr
Mon Jan 24 11:13:00 EST 2005


On Jan 24, 2005, at 4:46 PM, Marc Logghe wrote:

> Guess, this is the bioperl implementation of EMBOSS tranalign ?
> http://www.rfcgr.mrc.ac.uk/Software/EMBOSS/Apps/tranalign.html

This old script is indeed very similar to tranalign, except that it 
offers some quite useful features:
  - you can specifiy a different genetic code for each DNA sequence (-G 
option)
  - you can ask for a mapping of prot/dna sequences by their names 
instead of their position in the file (-i option)

What is now missing is a feature to specify alternate start codons.

BTW, I forgot to mention that the script uses the bioperl translate 
method, to which the code is being passed:
	my $trans = $dna->translate(undef, undef, $frame, $code);

and of course, $dna is a bioperl sequence loaded with the standard 
Seqio methods:

$in_dna_seqs = Bio::SeqIO->newFh (-file => $dna_file, 				              
                                    -format => 
$dna_file_format);http://javascript.internet.com/foldertree/


>
> ML
>
>> -----Original Message-----
>> From: bioperl-l-bounces at portal.open-bio.org
>> [mailto:bioperl-l-bounces at portal.open-bio.org]On Behalf Of
>> Jason Stajich
>> Sent: Monday, January 24, 2005 4:42 PM
>> To: Catherine Letondal
>> Cc: bioperl-l at portal.open-bio.org; Maureen L Coleman
>> Subject: Re: [Bioperl-l] protal2dna and Bio::SimpleAlign
>>
>>
>>
>> On Jan 24, 2005, at 10:28 AM, Catherine Letondal wrote:
>>
>>>
>>> On Jan 23, 2005, at 3:19 PM, Jason Stajich wrote:
>>>
>>>> I'm not familiar with the script.
>>>
>>> Web:
>>> http://bioweb.pasteur.fr/seqanal/interfaces/protal2dna.html
>>> Man:
>>> http://bioweb.pasteur.fr/docs/man/man/protal2dna.1.html
>>> Ftp:
>>> ftp://ftp.pasteur.fr/pub/GenSoft/unix/alignment/protal2dna
>>>
>>>>
>>>> Bio::Align::Utilities does protein to DNA mapping for an alignment
>>>> with the aa_to_dna_aln function.
>>>
>>> The problem with this function aa_to_dna_aln is that  is
>> restricted to
>>> frame 1 and to the standard genetic code, right?
>>>        aa_to_dna_aln
>>>
>> This is an alignment mapper routine not an alignment routine
>> itsself.
>> So I think I was just being stupid and not looking at what protal2dna
>> really was doing.
>>
>> You provide it the protein multiple sequence alignment alignment and
>> the coding sequence which gave rise to it.  It maps the gaps
>> back in so
>> you have a CDS alignment.  Very basic iterating through the alignment.
>>
>> So it has to all be in-frame and already spliced, it should have been
>> called aa_to_cds_aln.
>>
>> The method is intended for getting ready to do Ka/Ks type
>> stuff so that
>> you have aligned  the sequences on codon boundaries and with
>> knowledge
>> about conservative aa replacements.
>>
>> apologies for inciting confusion...
>> -j
>>
>>>         Title   : aa_to_dna_aln
>>>         Usage   : my $dnaaln = aa_to_dna_aln($aa_aln, \%seqs);
>>>         Function: Will convert an AA alignment to DNA space
>> given the
>>>                   corresponding DNA sequences.  Note that
>> this method
>>> expects
>>>                   the DNA sequences to be in frame +1 (GFF
>> frame 0) as
>>> it will
>>>                   start to project into coordinates starting at the
>>> first base of
>>>                   the DNA sequence, if this alignment represents a
>>> different
>>>                   frame for the cDNA you will need to edit the DNA
>>> sequences
>>>                   to remove the 1st or 2nd bases (and
>> revcom if things
>>> should be).
>>>         Returns : Bio::Align::AlignI object
>>>         Args    : 2 arguments, the alignment and a hashref.
>>>                   Alignment is a Bio::Align::AlignI of amino acid
>>> sequences.
>>>                   The hash reference should have keys which are
>>>                   the display_ids for the aa
>>>                   sequences in the alignment and the values are a
>>>                   Bio::PrimarySeqI object for the corresponding
>>>                   spliced cDNA sequence.
>>>
>>>
>>> The other problem when using tools offering several genetic code
>>> (these sequences need a bacterial genetic code), is that the start
>>> codon of this code is not the right one. These sequences
>> need: GTG=M
>>> (and not V).
>>>
>>>>
>>>> -jason
>>>> On Jan 22, 2005, at 4:07 PM, Maureen L Coleman wrote:
>>>>
>>>>> Hi.
>>>>> I'm trying to use the protal2dna script (downloaded from Pasteur
>>>>> site) to convert protein alignments back to DNA
>> alignments. It works
>>>>> in some cases but not in others.  In the cases where it doesn't
>>>>> work, it pulls out the same sequence twice instead of pulling out
>>>>> seq1 and seq2 from my protein alignment.  Then when it tries to
>>>>> match it up with the corresponding DNA sequence, it
>> doesn't work -
>>>>> it matches prot1 with dna1 (correctly) and prot1 with dna2
>>>>> (incorrectly).
>>>>>
>>>>> I suspect this might be related to the name,start,end
>> (nse) method
>>>>> in Bio::SimpleAlign.  Any suggestions?
>>>>>
>>>>> Thanks,
>>>>> Maureen
>>>>>
>>>>> _______________________________________________
>>>>> Bioperl-l mailing list
>>>>> Bioperl-l at portal.open-bio.org
>>>>> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>>>>>
>>>>>
>>>> --
>>>> Jason Stajich
>>>> jason.stajich at duke.edu
>>>> http://www.duke.edu/~jes12/
>>>>
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at portal.open-bio.org
>>>> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>>
>> --
>> Jason Stajich
>> jason.stajich at duke.edu
>> http://www.duke.edu/~jes12/
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at portal.open-bio.org
>> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>>



More information about the Bioperl-l mailing list