[Bioperl-l] Using frame info from GFF in gettinga Seq->spliced_seq

Chris Fields cjfields at uiuc.edu
Fri Dec 8 02:52:47 UTC 2006


Another issue is the splittype() is not defined, though I don't think that
would kill anything as currently implemented.  However, one thing we have
passingly discussed is having Bio::Location::Split objects possibly exhibit
different (but expected) behaviors based upon the splittype() (order, join,
or bond).  It's one of the things I want to work out for the next release.

If Scott's fix doesn't work and the problem persists, you should file a bug
report with some sample data for us to test out.

chris

> Amir,
> 
> I don't know for sure what the problem is, but here is one 
> possibility:
> the number in column 8 of a GFF file is not the frame, it is 
> the phase.
> See the GFF3 spec for a description of what the phase is:
> 
>   http://www.sequenceontology.org/gff3.shtml
> 
> (It doesn't matter if you are using GFF3 or GFF2, as the 
> phase is the same in both).
> 
> Scott
> 
> 
> On Thu, 2006-12-07 at 16:32 -0500, Amir Karger wrote:
> > I need to know how to get the frame information in exon features 
> > (created by Bio::Tools::GFF) into a whole-gene feature that will be 
> > translated into a protein.
> > 
> > I'm reading in some fungal GFFs generated by Jason Stajich. I
> > 
> > - Use Bio::Tools::GFF to create a feature for each exon in a gene
> > - Create a Bio::Location::Split object containing each feature's 
> > location
> > - Create a Bio::SeqFeature::Generic object whose location 
> is the above 
> > BL::Split
> > - Attach my contig Bio::Seq to the feature
> > - get the protein with feature->spliced_seq->translate->seq
> > 
> > (Code below)
> > 
> > Unfortunately, I get the wrong result when the GFF features 
> have frame 
> > != 0. This happens for only a few percent of the exons, but when it 
> > does, I end up translating in the wrong frame.
> > 
> > If I read the docs correctly, Location objects don't have a 
> frame. So 
> > how do I get the correct spliced_seq, which skips one or 
> two bp at the 
> > beginning of certain exons?
> > 
> > I suspect the answer to this is that I'm going about this in 
> > completely the wrong way, in which case, please tell me how 
> I ought to be doing it.
> > 
> > Thanks,
> > - Amir Karger
> > Research Computing
> > Life Sciences Division
> > Harvard University
> > 
> > P.S. In case you want to see actual code, here it is. After using 
> > Bio::Tools::GFF to create a sorted list of features for each exon 
> > (basically stolen from the module POD), I:
> >     # Create a new object representing the exons' gene
> >     my $coding_loc_obj = new Bio::Location::Split;
> >     foreach my $exon (@sorted_exons) {
> >         $coding_loc_obj->add_sub_Location($exon->location);
> >     }
> > 
> >     # Build a spliced feature representing the whole gene
> >     my $spliced_feat = new Bio::SeqFeature::Generic(
> >         -start  => $coding_loc_obj->start,
> >         -end    => $coding_loc_obj->end,
> >         -strand => $strand_num,
> >         -primary=> "splicedGene",
> >     );
> >     $spliced_feat->location($coding_loc_obj);
> > 
> >     # Attach a contig object containing the sequence
> >     $spliced_feat->attach_seq($contig_obj->bioperl_object);
> > 
> >     # Get the spliced seq and translate to protein:
> >     my $coding_seq = $spliced_feat->spliced_seq->seq;
> >     my $protein = $spliced_feat->spliced_seq->translate->seq;






More information about the Bioperl-l mailing list