Bioperl: translating genes with >1 exon

Keith James kdj@sanger.ac.uk
09 Apr 2000 18:46:10 +0100


Hi,

I have to write scripts to process bacterial sequence, so I didn't
notice until recently that the way one has to translate a multi-exon
gene seems quite laborious.

e.g. taking a feature which is a 5 exon gene it appears that I have
to:

Ask for its subsequences

Get the PrimarySeq object of each subsequence

Get the sequence string of each PrimarySeq object

Join the sequence strings together

Make a new PrimarySeq object from the string

Translate that, making yet another PrimarySeq object


I think that this is done often enough that there should be a method
for getting the combined exons from a feature. In fact, having the
translate method available on a PrimarySeq object only makes sense in
the special case of having 1 exon in your gene.

If there are no objections I would like to add a method which returned
a PrimarySeq object consisting of the concatenated subsquences,
analagous to the seq() method which currently gives the PrimarySeq
object representing both introns and exons.

I'm not quite sure how to deal with the potential multiple levels of
subsequences. I suppose there could be a method which returns a list
of integers, one for each level of subsequences, the integer being the
number of features on that 'level'.

e.g.

parent sequence  -------------
feature           -----------
sub-features       -- ---  --   level 0
sub-sub-features   -  - -  -    level 1

returns (3, 4)

You could then ask for the sequence of level 0 (exons) as a single,
new PrimarySeq object. I wonder if anyone would want second (or
subsequent layers) at all, much less want the combined sequence of
them (maybe the individual sequences, though). However, as multiple
layers are allowed there should be easy ways of dealing with them.

Incidentally, I found having two methods with the same name, but doing
different things a bit confusing:

my $nt = $feature->seq()->seq();

where the first seq() returns a PrimarySeq object and the second one
a sequence string.

Comments?

cheers,

Keith

-- 

Keith James  --  kdj@sanger.ac.uk  --  http://www.sanger.ac.uk/Users/kdj
The Sanger Centre, Wellcome Trust Genome Campus, Hinxton, Cambs CB10 1SA
=========== Bioperl Project Mailing List Message Footer =======
Project URL: http://bio.perl.org/
For info about how to (un)subscribe, where messages are archived, etc:
http://www.techfak.uni-bielefeld.de/bcd/Perl/Bio/vsns-bcd-perl.html
====================================================================