[Bioperl-l] Using frame info from GFF in getting a Seq->spliced_seq

Jason Stajich jason at bioperl.org
Fri Dec 8 02:01:33 UTC 2006


This was a problem in the gene prediction output I suspect, more  
recent versions of the program should have fixed this.  I do not  
currently have free time to deal with the errors in the small number  
of ORFs where this has happened.

I think you just need to do
  start -= start- (frame*strand)
for 1st exons.

You can also probably provide the 1st exon's frame to the translate  
function as another possibility but you should try and get the CDS  
correct first depending on your downstream analyses.

-jason
On Dec 7, 2006, at 1:32 PM, Amir Karger wrote:

> I need to know how to get the frame information in exon features
> (created by Bio::Tools::GFF) into a whole-gene feature that will be
> translated into a protein.
>
> I'm reading in some fungal GFFs generated by Jason Stajich. I
>
> - Use Bio::Tools::GFF to create a feature for each exon in a gene
> - Create a Bio::Location::Split object containing each feature's
> location
> - Create a Bio::SeqFeature::Generic object whose location is the above
> BL::Split
> - Attach my contig Bio::Seq to the feature
> - get the protein with feature->spliced_seq->translate->seq
>
> (Code below)
>
> Unfortunately, I get the wrong result when the GFF features have frame
> != 0. This happens for only a few percent of the exons, but when it
> does, I end up translating in the wrong frame.
>
> If I read the docs correctly, Location objects don't have a frame. So
> how do I get the correct spliced_seq, which skips one or two bp at the
> beginning of certain exons?
>
> I suspect the answer to this is that I'm going about this in  
> completely
> the wrong way, in which case, please tell me how I ought to be  
> doing it.
>
> Thanks,
> - Amir Karger
> Research Computing
> Life Sciences Division
> Harvard University
>
> P.S. In case you want to see actual code, here it is. After using
> Bio::Tools::GFF to create a sorted list of features for each exon
> (basically stolen from the module POD), I:
>     # Create a new object representing the exons' gene
>     my $coding_loc_obj = new Bio::Location::Split;
>     foreach my $exon (@sorted_exons) {
>         $coding_loc_obj->add_sub_Location($exon->location);
>     }
>
>     # Build a spliced feature representing the whole gene
>     my $spliced_feat = new Bio::SeqFeature::Generic(
>         -start  => $coding_loc_obj->start,
>         -end    => $coding_loc_obj->end,
>         -strand => $strand_num,
>         -primary=> "splicedGene",
>     );
>     $spliced_feat->location($coding_loc_obj);
>
>     # Attach a contig object containing the sequence
>     $spliced_feat->attach_seq($contig_obj->bioperl_object);
>
>     # Get the spliced seq and translate to protein:
>     my $coding_seq = $spliced_feat->spliced_seq->seq;
>     my $protein = $spliced_feat->spliced_seq->translate->seq;
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l




More information about the Bioperl-l mailing list