[Bioperl-l] Bio::DB::GFF, aggregators, and spliced_seq
Allen Gathman
agathman at semo.edu
Wed Aug 24 09:44:47 EDT 2005
Well, I appear to have fixed it myself, although in a kind of inelegant way
-- I pulled the display_id of the predicted gene I wanted, then used it in a
get_feature_by_name call to pull the feature out again.
@new_genes=$db->get_feature_by_name( Sequence => $gid);
$ngene = shift (@new_genes);
etc.
That "re-captured" feature ("$ngene" above) splices correctly when I use
spliced_seq on it. I'm still a bit puzzled why the original code doesn't
get me the whole gene.
Allen Gathman
http://cstl-csm.semo.edu/gathman
> -----Original Message-----
> From: bioperl-l-bounces at portal.open-bio.org [mailto:bioperl-l-
> bounces at portal.open-bio.org] On Behalf Of Gathman, Allen
> Sent: Tuesday, August 23, 2005 1:39 PM
> To: 'bioperl-l at portal.open-bio.org'
> Subject: [Bioperl-l] Bio::DB::GFF, aggregators, and spliced_seq
>
> Hi, BioPerl gurus:
>
>
>
> Although this question involves a Gbrowse database, I think it's actually
> a
> BioPerl question at heart - and in any case it appears that there's a lot
> of
> overlap between the people who answer questions in both lists, so I'm
> guessing this is a good place for this question.
>
>
>
> I've written a script that finds particular pfam hits in a GBROWSE
> database,
> then uses "overlapping_features" to find predicted gene features of type
> "transcript:GLEAN" that overlap those pfams. I've set the aggregator
> "transcript" as {CDS/mRNA} already. I select features using a regular
> expression to choose particular names, then I use spliced_seq to return
> the
> spliced CDS of each feature - but I'm only getting back the CDS that
> actually overlap the pfam hit, not the full predicted gene. So my
> question
> is, what do I need to do in order to get ALL the CDS of each predicted
> gene
> feature spliced together, instead of only the ones that actually overlap
> the
> pfam hit I used to select that predicted gene?
>
>
>
> Thanks in advance for any help you can give...
>
>
>
> Here's the code:
>
>
>
> #!/usr/bin/perl
>
>
>
> use strict;
>
> use Bio::DB::GFF;
>
> use Bio::Seq;
>
> use Bio::SeqIO;
>
> use Getopt::Long;
>
>
>
> my $outfile;
>
> GetOptions(
>
> 'o|outfile=s' => \$outfile,
>
> );
>
>
>
> my $outfa= Bio::SeqIO -> new (-file => ">$outfile",
>
> -format => 'Fasta'
>
> );
>
>
>
> my $db = Bio::DB::GFF-> new ( - adaptor => 'dbi::mysql',
>
> -dsn =>
> 'dbi:mysql:database=cc;host=localhost',
>
> -fasta => '/gbrowse/databases/cc'
>
> );
>
>
>
> $db->add_aggregator('transcript{CDS/mRNA}');
>
>
>
> for (my $i =1; $i<=20; $i++){
>
> my $pfamname="Peptidase_C$i";
>
> my @pfams = $db->get_feature_by_name( Domain => $pfamname);
>
> foreach my $pfamhit (@pfams){
>
> my $desc = $pfamname;
>
> my $score=$pfamhit->score;
>
> my $name = $pfamhit->name;
>
> $desc.= " $score ";
>
> $desc.= $pfamhit->location->seq_id();
>
> $desc.= ": ";
>
> #
>
> # Here's where I'm selecting predicted genes that overlap the Pfam hit
>
> #
>
> my @genes = $db -> overlapping_features(
>
> -refseq => $pfamhit->location->seq_id,
>
> -start => $pfamhit->start,
>
> -stop => $pfamhit->stop,
>
> -types =>'transcript:GLEAN'
>
> );
>
> #
>
> # Now I'm choosing the ones with names I want out of the selected genes
>
> #
>
> foreach my $gene (@genes){
>
> my $gid=$gene->display_id();
>
> if ($gid =~/aug_GLEAN/){
>
> $desc.=$gene->start;
>
> $desc.=" - ";
>
> $desc.=$gene->stop;
>
> #
>
> # Here I'm splicing the gene, tacking on a description, and outputting it.
>
> #
>
>
>
> my $splseq = $gene->spliced_seq();
>
> $splseq->desc($desc);
>
> $splseq->display_id($gid);
>
> $outfa->write_seq($splseq);
>
>
>
> }# end if aug_GLEAN
>
> }# end foreach gene
>
> }# end foreach pfamhit
>
> } # end for numbers
>
> close OUT;
>
>
>
>
>
> Allen Gathman
>
> http://cstl-csm.semo.edu/gathman
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
More information about the Bioperl-l
mailing list