[Bioperl-l] Using frame info from GFF in gettinga Seq->spliced_seq
Chris Fields
cjfields at uiuc.edu
Fri Dec 8 02:52:47 UTC 2006
Another issue is the splittype() is not defined, though I don't think that
would kill anything as currently implemented. However, one thing we have
passingly discussed is having Bio::Location::Split objects possibly exhibit
different (but expected) behaviors based upon the splittype() (order, join,
or bond). It's one of the things I want to work out for the next release.
If Scott's fix doesn't work and the problem persists, you should file a bug
report with some sample data for us to test out.
chris
> Amir,
>
> I don't know for sure what the problem is, but here is one
> possibility:
> the number in column 8 of a GFF file is not the frame, it is
> the phase.
> See the GFF3 spec for a description of what the phase is:
>
> http://www.sequenceontology.org/gff3.shtml
>
> (It doesn't matter if you are using GFF3 or GFF2, as the
> phase is the same in both).
>
> Scott
>
>
> On Thu, 2006-12-07 at 16:32 -0500, Amir Karger wrote:
> > I need to know how to get the frame information in exon features
> > (created by Bio::Tools::GFF) into a whole-gene feature that will be
> > translated into a protein.
> >
> > I'm reading in some fungal GFFs generated by Jason Stajich. I
> >
> > - Use Bio::Tools::GFF to create a feature for each exon in a gene
> > - Create a Bio::Location::Split object containing each feature's
> > location
> > - Create a Bio::SeqFeature::Generic object whose location
> is the above
> > BL::Split
> > - Attach my contig Bio::Seq to the feature
> > - get the protein with feature->spliced_seq->translate->seq
> >
> > (Code below)
> >
> > Unfortunately, I get the wrong result when the GFF features
> have frame
> > != 0. This happens for only a few percent of the exons, but when it
> > does, I end up translating in the wrong frame.
> >
> > If I read the docs correctly, Location objects don't have a
> frame. So
> > how do I get the correct spliced_seq, which skips one or
> two bp at the
> > beginning of certain exons?
> >
> > I suspect the answer to this is that I'm going about this in
> > completely the wrong way, in which case, please tell me how
> I ought to be doing it.
> >
> > Thanks,
> > - Amir Karger
> > Research Computing
> > Life Sciences Division
> > Harvard University
> >
> > P.S. In case you want to see actual code, here it is. After using
> > Bio::Tools::GFF to create a sorted list of features for each exon
> > (basically stolen from the module POD), I:
> > # Create a new object representing the exons' gene
> > my $coding_loc_obj = new Bio::Location::Split;
> > foreach my $exon (@sorted_exons) {
> > $coding_loc_obj->add_sub_Location($exon->location);
> > }
> >
> > # Build a spliced feature representing the whole gene
> > my $spliced_feat = new Bio::SeqFeature::Generic(
> > -start => $coding_loc_obj->start,
> > -end => $coding_loc_obj->end,
> > -strand => $strand_num,
> > -primary=> "splicedGene",
> > );
> > $spliced_feat->location($coding_loc_obj);
> >
> > # Attach a contig object containing the sequence
> > $spliced_feat->attach_seq($contig_obj->bioperl_object);
> >
> > # Get the spliced seq and translate to protein:
> > my $coding_seq = $spliced_feat->spliced_seq->seq;
> > my $protein = $spliced_feat->spliced_seq->translate->seq;
More information about the Bioperl-l
mailing list