Bioperl: translating genes with >1 exon
Keith James
kdj@sanger.ac.uk
10 Apr 2000 10:14:50 +0100
>>>>> "Ewan" == Ewan Birney <birney@ebi.ac.uk> writes:
[...]
>> I'm not quite sure how to deal with the potential multiple
>> levels of subsequences. I suppose there could be a method which
>> returns a list of integers, one for each level of subsequences,
>> the integer being the number of features on that 'level'.
Ewan> Allowing there to be an arbitery number of sequence features
Ewan> nested is probably a big mistake in bioperl. I thought it
Ewan> was clever about 6 months ago. Now - I am not so sure.
Yes, if arbitrary nesting is condoned I think we would live to regret
it. Someone would take it to its inevitable conclusion.
Ewan> I know in ensembl we can definite classify sequence features
Ewan> into three classes:
Ewan> plain sequence features (we use Bioperl
Ewan> SeqFeature::Generic)
Ewan> homol sequence features (BLAST hits - one sequence
Ewan> feature on one sequence and one sequence feature on another)
Ewan> (we use Bioperl SeqFeature::Homol)
Ewan> fsets - feature sets - one sequence feature with one
Ewan> level of sub sequene features (Genscan predictions) (we use
Ewan> Bioperl SeqFeature::Generic with only one level of sub
Ewan> SeqFeatures).
[...]
Ewan> Options:
Ewan> a) reorganise the Sequence feature code to have
Ewan> official Fset type features with one level of sub sequence
Ewan> features. Fset could have three methods returning sequence
Ewan> objects
Ewan> $fset->seq (or $fset->seqobj - see naming proposal
Ewan> below). $fset->spliced_seq $fset->entire_seq
Ewan> - for backwards compatibility we would keep
Ewan> sub_SeqFeature method but it would always return empty for
Ewan> SeqFeature::Generic and SeqFeature::Homol
Ewan> b) Keep with the arbitary nesting of sequence features
Ewan> generically and come up with standards such as the one
Ewan> suggested below for descending the tree (what if the tree is
Ewan> not balanced) and how does one cope with sub sequence
Ewan> features of the wrong "type" being added?
Ewan> Personally - I am happier with (a).
I was being Devil's Advocate there, really. Trying to maintain an
arbitrary tree would be quite nasty - lots of potential variation,
special cases and breeding grounds for bugs. I prefer (a) too.
If there is an $fset->spliced_seq (the name implying that the
subsequences represent exons), there should perhaps be an
$fset->itrons to return a list of intron $seqobjs.
I think that there should be a good correlation between the $fset and
a biological entity it's supposed to represent. That way, if you know
the biology, you know what methods to expect (in theory!).
>> Incidentally, I found having two methods with the same name,
>> but doing different things a bit confusing:
>>
>> my $nt = $feature->seq()->seq();
>>
Ewan> Hmmm. I have seen this as well. It makes me wince as well.
Ewan> What about going $feature->seqobj() in the future?
I think that would help.
--
Keith James -- kdj@sanger.ac.uk -- http://www.sanger.ac.uk/Users/kdj
The Sanger Centre, Wellcome Trust Genome Campus, Hinxton, Cambs CB10 1SA
=========== Bioperl Project Mailing List Message Footer =======
Project URL: http://bio.perl.org/
For info about how to (un)subscribe, where messages are archived, etc:
http://www.techfak.uni-bielefeld.de/bcd/Perl/Bio/vsns-bcd-perl.html
====================================================================