Bioperl: translating genes with >1 exon

Keith James kdj@sanger.ac.uk
10 Apr 2000 10:14:50 +0100


>>>>> "Ewan" == Ewan Birney <birney@ebi.ac.uk> writes:

[...]

    >>  I'm not quite sure how to deal with the potential multiple
    >> levels of subsequences. I suppose there could be a method which
    >> returns a list of integers, one for each level of subsequences,
    >> the integer being the number of features on that 'level'.

    Ewan> Allowing there to be an arbitery number of sequence features
    Ewan> nested is probably a big mistake in bioperl. I thought it
    Ewan> was clever about 6 months ago. Now - I am not so sure.

Yes, if arbitrary nesting is condoned I think we would live to regret
it. Someone would take it to its inevitable conclusion.

    Ewan> I know in ensembl we can definite classify sequence features
    Ewan> into three classes:

    Ewan> 	plain sequence features (we use Bioperl
    Ewan> SeqFeature::Generic)

    Ewan> 	homol sequence features (BLAST hits - one sequence
    Ewan> feature on one sequence and one sequence feature on another)
    Ewan> (we use Bioperl SeqFeature::Homol)

    Ewan> 	fsets - feature sets - one sequence feature with one
    Ewan> level of sub sequene features (Genscan predictions) (we use
    Ewan> Bioperl SeqFeature::Generic with only one level of sub
    Ewan> SeqFeatures).

[...]

    Ewan> Options:

    Ewan> 	a) reorganise the Sequence feature code to have
    Ewan> official Fset type features with one level of sub sequence
    Ewan> features. Fset could have three methods returning sequence
    Ewan> objects

    Ewan> 	$fset->seq (or $fset->seqobj - see naming proposal
    Ewan> below).  $fset->spliced_seq $fset->entire_seq

    Ewan> 	- for backwards compatibility we would keep
    Ewan> sub_SeqFeature method but it would always return empty for
    Ewan> SeqFeature::Generic and SeqFeature::Homol

    Ewan> 	b) Keep with the arbitary nesting of sequence features
    Ewan> generically and come up with standards such as the one
    Ewan> suggested below for descending the tree (what if the tree is
    Ewan> not balanced) and how does one cope with sub sequence
    Ewan> features of the wrong "type" being added?

    Ewan> Personally - I am happier with (a).

I was being Devil's Advocate there, really. Trying to maintain an
arbitrary tree would be quite nasty - lots of potential variation,
special cases and breeding grounds for bugs. I prefer (a) too.

If there is an $fset->spliced_seq (the name implying that the
subsequences represent exons), there should perhaps be an
$fset->itrons to return a list of intron $seqobjs.

I think that there should be a good correlation between the $fset and
a biological entity it's supposed to represent. That way, if you know
the biology, you know what methods to expect (in theory!).

    >> Incidentally, I found having two methods with the same name,
    >> but doing different things a bit confusing:
    >> 
    >> my $nt = $feature->seq()->seq();
    >> 

    Ewan> Hmmm. I have seen this as well. It makes me wince as well.

    Ewan> What about going $feature->seqobj() in the future?

I think that would help.

-- 

Keith James  --  kdj@sanger.ac.uk  --  http://www.sanger.ac.uk/Users/kdj
The Sanger Centre, Wellcome Trust Genome Campus, Hinxton, Cambs CB10 1SA
=========== Bioperl Project Mailing List Message Footer =======
Project URL: http://bio.perl.org/
For info about how to (un)subscribe, where messages are archived, etc:
http://www.techfak.uni-bielefeld.de/bcd/Perl/Bio/vsns-bcd-perl.html
====================================================================