[Bioperl-l] Using frame info from GFF in gettinga Seq->spliced_seq

Amir Karger akarger at CGR.Harvard.edu
Fri Dec 8 18:33:09 UTC 2006


> Another issue is the splittype() is not defined, though I 
> don't think that
> would kill anything as currently implemented.  However, one 
> thing we have
> passingly discussed is having Bio::Location::Split objects 
> possibly exhibit
> different (but expected) behaviors based upon the splittype() 
> (order, join,
> or bond).  It's one of the things I want to work out for the 
> next release.

Should I be writing -splittype => "JOIN" or some such in my new()?

-Amir Karger

> 
> chris
> 
> > Amir,
> > 
> > I don't know for sure what the problem is, but here is one 
> > possibility:
> > the number in column 8 of a GFF file is not the frame, it is 
> > the phase.
> > See the GFF3 spec for a description of what the phase is:
> > 
> >   http://www.sequenceontology.org/gff3.shtml
> > 
> > (It doesn't matter if you are using GFF3 or GFF2, as the 
> > phase is the same in both).
> > 
> > Scott
> > 
> > 
> > On Thu, 2006-12-07 at 16:32 -0500, Amir Karger wrote:
> > > I need to know how to get the frame information in exon features 
> > > (created by Bio::Tools::GFF) into a whole-gene feature 
> that will be 
> > > translated into a protein.
> > > 
> > > I'm reading in some fungal GFFs generated by Jason Stajich. I
> > > 
> > > - Use Bio::Tools::GFF to create a feature for each exon in a gene
> > > - Create a Bio::Location::Split object containing each feature's 
> > > location
> > > - Create a Bio::SeqFeature::Generic object whose location 
> > is the above 
> > > BL::Split
> > > - Attach my contig Bio::Seq to the feature
> > > - get the protein with feature->spliced_seq->translate->seq
> > > 
> > > (Code below)
> > > 
> > > Unfortunately, I get the wrong result when the GFF features 
> > have frame 
> > > != 0. This happens for only a few percent of the exons, 
> but when it 
> > > does, I end up translating in the wrong frame.
> > > 
> > > If I read the docs correctly, Location objects don't have a 
> > frame. So 
> > > how do I get the correct spliced_seq, which skips one or 
> > two bp at the 
> > > beginning of certain exons?
> > > 
> > > I suspect the answer to this is that I'm going about this in 
> > > completely the wrong way, in which case, please tell me how 
> > I ought to be doing it.
> > > 
> > > Thanks,
> > > - Amir Karger
> > > Research Computing
> > > Life Sciences Division
> > > Harvard University
> > > 
> > > P.S. In case you want to see actual code, here it is. After using 
> > > Bio::Tools::GFF to create a sorted list of features for each exon 
> > > (basically stolen from the module POD), I:
> > >     # Create a new object representing the exons' gene
> > >     my $coding_loc_obj = new Bio::Location::Split;
> > >     foreach my $exon (@sorted_exons) {
> > >         $coding_loc_obj->add_sub_Location($exon->location);
> > >     }
> > > 
> > >     # Build a spliced feature representing the whole gene
> > >     my $spliced_feat = new Bio::SeqFeature::Generic(
> > >         -start  => $coding_loc_obj->start,
> > >         -end    => $coding_loc_obj->end,
> > >         -strand => $strand_num,
> > >         -primary=> "splicedGene",
> > >     );
> > >     $spliced_feat->location($coding_loc_obj);
> > > 
> > >     # Attach a contig object containing the sequence
> > >     $spliced_feat->attach_seq($contig_obj->bioperl_object);
> > > 
> > >     # Get the spliced seq and translate to protein:
> > >     my $coding_seq = $spliced_feat->spliced_seq->seq;
> > >     my $protein = $spliced_feat->spliced_seq->translate->seq;
> 
> 
> 
> 




More information about the Bioperl-l mailing list