[Bioperl-l] Using frame info from GFF in getting a Seq->spliced_seq
Scott Cain
cain at cshl.edu
Thu Dec 7 22:46:09 UTC 2006
Amir,
I don't know for sure what the problem is, but here is one possibility:
the number in column 8 of a GFF file is not the frame, it is the phase.
See the GFF3 spec for a description of what the phase is:
http://www.sequenceontology.org/gff3.shtml
(It doesn't matter if you are using GFF3 or GFF2, as the phase is the
same in both).
Scott
On Thu, 2006-12-07 at 16:32 -0500, Amir Karger wrote:
> I need to know how to get the frame information in exon features
> (created by Bio::Tools::GFF) into a whole-gene feature that will be
> translated into a protein.
>
> I'm reading in some fungal GFFs generated by Jason Stajich. I
>
> - Use Bio::Tools::GFF to create a feature for each exon in a gene
> - Create a Bio::Location::Split object containing each feature's
> location
> - Create a Bio::SeqFeature::Generic object whose location is the above
> BL::Split
> - Attach my contig Bio::Seq to the feature
> - get the protein with feature->spliced_seq->translate->seq
>
> (Code below)
>
> Unfortunately, I get the wrong result when the GFF features have frame
> != 0. This happens for only a few percent of the exons, but when it
> does, I end up translating in the wrong frame.
>
> If I read the docs correctly, Location objects don't have a frame. So
> how do I get the correct spliced_seq, which skips one or two bp at the
> beginning of certain exons?
>
> I suspect the answer to this is that I'm going about this in completely
> the wrong way, in which case, please tell me how I ought to be doing it.
>
> Thanks,
> - Amir Karger
> Research Computing
> Life Sciences Division
> Harvard University
>
> P.S. In case you want to see actual code, here it is. After using
> Bio::Tools::GFF to create a sorted list of features for each exon
> (basically stolen from the module POD), I:
> # Create a new object representing the exons' gene
> my $coding_loc_obj = new Bio::Location::Split;
> foreach my $exon (@sorted_exons) {
> $coding_loc_obj->add_sub_Location($exon->location);
> }
>
> # Build a spliced feature representing the whole gene
> my $spliced_feat = new Bio::SeqFeature::Generic(
> -start => $coding_loc_obj->start,
> -end => $coding_loc_obj->end,
> -strand => $strand_num,
> -primary=> "splicedGene",
> );
> $spliced_feat->location($coding_loc_obj);
>
> # Attach a contig object containing the sequence
> $spliced_feat->attach_seq($contig_obj->bioperl_object);
>
> # Get the spliced seq and translate to protein:
> my $coding_seq = $spliced_feat->spliced_seq->seq;
> my $protein = $spliced_feat->spliced_seq->translate->seq;
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
--
------------------------------------------------------------------------
Scott Cain, Ph. D. cain at cshl.edu
GMOD Coordinator (http://www.gmod.org/) 216-392-3087
Cold Spring Harbor Laboratory
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20061207/913096a5/attachment.sig>
More information about the Bioperl-l
mailing list