[Bioperl-l] Using frame info from GFF in getting a Seq->spliced_seq
Amir Karger
akarger at CGR.Harvard.edu
Thu Dec 7 21:32:51 UTC 2006
I need to know how to get the frame information in exon features
(created by Bio::Tools::GFF) into a whole-gene feature that will be
translated into a protein.
I'm reading in some fungal GFFs generated by Jason Stajich. I
- Use Bio::Tools::GFF to create a feature for each exon in a gene
- Create a Bio::Location::Split object containing each feature's
location
- Create a Bio::SeqFeature::Generic object whose location is the above
BL::Split
- Attach my contig Bio::Seq to the feature
- get the protein with feature->spliced_seq->translate->seq
(Code below)
Unfortunately, I get the wrong result when the GFF features have frame
!= 0. This happens for only a few percent of the exons, but when it
does, I end up translating in the wrong frame.
If I read the docs correctly, Location objects don't have a frame. So
how do I get the correct spliced_seq, which skips one or two bp at the
beginning of certain exons?
I suspect the answer to this is that I'm going about this in completely
the wrong way, in which case, please tell me how I ought to be doing it.
Thanks,
- Amir Karger
Research Computing
Life Sciences Division
Harvard University
P.S. In case you want to see actual code, here it is. After using
Bio::Tools::GFF to create a sorted list of features for each exon
(basically stolen from the module POD), I:
# Create a new object representing the exons' gene
my $coding_loc_obj = new Bio::Location::Split;
foreach my $exon (@sorted_exons) {
$coding_loc_obj->add_sub_Location($exon->location);
}
# Build a spliced feature representing the whole gene
my $spliced_feat = new Bio::SeqFeature::Generic(
-start => $coding_loc_obj->start,
-end => $coding_loc_obj->end,
-strand => $strand_num,
-primary=> "splicedGene",
);
$spliced_feat->location($coding_loc_obj);
# Attach a contig object containing the sequence
$spliced_feat->attach_seq($contig_obj->bioperl_object);
# Get the spliced seq and translate to protein:
my $coding_seq = $spliced_feat->spliced_seq->seq;
my $protein = $spliced_feat->spliced_seq->translate->seq;
More information about the Bioperl-l
mailing list