[Bioperl-l] Re: Frameshifts in alignments ... ?

Ewan Birney birney@ebi.ac.uk
Thu, 5 Sep 2002 08:05:34 +0100 (BST)


On Wed, 4 Sep 2002, Aaron J Mackey wrote:

> 
> package Bio::EncodedSeq;

I think we should go for Bio::Seq::EncodedSeq


> 
> use strict;
> use Bio::LocatableSeq;
> 
> @ISA = qw(Bio::LocatableSeq);
> 
> =head2 new
>  Title   : new
>  Usage   : $obj = Bio::EncodedSeq->new(-dnaseq   => "AGTACGTGTCATG",
>                                        -encoding => "CCCCCCFCCCCCC",
>                                        -id       => "myseq",
>                                        -start    => 1,
>                                        -end      => 13,
>                                        -strand   => 1
>                                       );
>  Function: creates a new Bio::EncodedSeq object from a supplied DNA
>            sequence
>  Returns : a new Bio::EncodedSeq object
>  Args    : dnaseq   - primary nucleotide sequence used to encode the
>                       protein
>            encoding - a string of characters (see Encoding Table)
>                       describing backwards frameshifts implied by the
>                       encoding but not present in the sequence will be
>                       added (as '-'s) to the sequence.  If not
>                       supplied, it will be assumed that all positions
>                       are coding (C).  Encoding may include either
>                       implicit phase encoding characters (i.e. "CCC")
>                       and/or explicit encoding characters (i.e. "CDE").
>                       Alternatively, encoding may be a hashref
>                       datastructure, with encoding characters as keys
>                       and Bio::LocationI objects (or arrayrefs of
>                       Bio::LocationI objects) as values, e.g.:
>                       { C => [ Bio::Location::Simple->new(1,9),
>                                Bio::Location::Simple->new(11,13) ],
>                         F => Bio::Location::Simple->new(10,10),
>                       } # same as "CCCCCCCCCFCCC"
>            id, start, end, strand - as with Bio::LocatableSeq; note
>                       that the coordinates are relative to the
>                       encoding DNA sequence, not the implicit protein
>                       sequence.
> =cut
> 
> =head2 encoding
>  Title   : encoding
>  Usage   : $obj->encoding("CCCCCC");
>            $obj->encoding( -encoding => { I => $location } );
>            $enc = $obj->encoding(-explicit => 1);
>            $enc = $obj->encoding("CCCCCC", -explicit => 1);
>            $enc = $obj->encoding(-location => $location,
>                                  -explicit => 1 );
>  Function: get/set the objects encoding, either globally or by location(s).
>  Returns : the (possibly new) encoding string.
>  Args    : encoding - see the encoding argument to the new() function.
>            explicit - whether or not to return explicit phase
>                       information in the coding (i.e. "CCC" becomes
>                       "CDE", "III" becomes "IJK", etc); defaults to 0.
>            location - optional; location to get/set the encoding.
>                       Defaults to the entire sequence.
> =cut
> 
> =head2 cds
>  Title   : cds
>  Usage   : $cds = $obj->cds();
>  Function: obtain the "spliced" DNA sequence, by removing any
>            nucleotides that participate in an UTR, forward frameshift
>            or intron, and replacing any unknown nucleotide implied by
>            a backward frameshift or gap with N's.
>  Returns : a Bio::EncodedSeq object, with an encoding consisting only
>            of "CCCC..".
>  Args    : none.
> =cut
> 
> =head2 translate
>  Title   : translate
>  Usage   : $prot = $obj->translate(@args);
>  Function: obtain the protein sequence encoded by the underlying DNA
>            sequence; same as $obj->cds()->translate(@args).
>  Returns : a Bio::PrimarySeq object.
>  Args    : same as the translate() function of Bio::PrimarySeqI
> =cut
> 
> =head2 seq
>  Title   : seq
>  Usage   : $protseq = $obj->seq();
>  Function: obtain the raw protein sequence encoded by the underlying
>            DNA sequence; This is the same as calling
>            $obj->translate()->seq();
>  Returns : a string of single-letter amino acid codes
>  Args :    same as the seq() function of Bio::PrimarySeq; note that this
>            function may not be used to set the protein sequence; see
>            the dnaseq() function for that.
> =cut
> 
> =head2 dnaseq
>  Title   : dnaseq
>  Usage   : $dnaseq = $obj->dnaseq();
>            $obj->dnaseq("ACGTGTCGT", "CCCCCCCCC");
>            $obj->dnaseq(-dnaseq => "ATG",
>                         -encoding => "CCC",
>                         -location => $loc );
>  Function: get/set the underlying DNA sequence; will overwrite any
>            current DNA and/or encoding information present.
>  Returns : a string of single-letter nucleotide codes, including any
>            gaps implied by the encoding.
>  Args    : dnaseq   - the DNA sequence to be used as a replacement
>            encoding - the encoding of the DNA sequence (see the new()
>                       constructor); defaults to all 'C'.
>            location - optional, the location of the DNA sequence to
>                       get/set; defaults to the entire sequence.
> =cut
> 
> [ and all the inherited Bio::LocatableSeq and Bio::PrimarySeqI
> methods; note that the coordinates of those methods will refer only to
> the underlying DNA sequence, not the implicit encoded protein sequence
> - my next task will be to extend Ewan and Heikki's Bio::Coordinate
> system to include Bio::Coordinate::EncodedPair so that conversions can
> be made more easily ... any comments on that? ]


You are a brave man. Look forward to seeing this in...



> 
> thanks for reading,
> 
> -Aaron
> 
> -- 
>  Aaron J Mackey
>  Pearson Laboratory
>  University of Virginia
>  (434) 924-2821
>  amackey@virginia.edu
> 
> 
> 
> 

-----------------------------------------------------------------
Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420
<birney@ebi.ac.uk>. 
-----------------------------------------------------------------