[Bioperl-l] Bio::DB::GFF, aggregators, and spliced_seq

Wed Aug 24 09:44:47 EDT 2005

Well, I appear to have fixed it myself, although in a kind of inelegant way
-- I pulled the display_id of the predicted gene I wanted, then used it in a
get_feature_by_name call to pull the feature out again.  

@new_genes=$db->get_feature_by_name( Sequence => $gid);
$ngene = shift (@new_genes);
etc. 

That "re-captured" feature ("$ngene" above) splices correctly when I use
spliced_seq on it.  I'm still a bit puzzled why the original code doesn't
get me the whole gene. 

Allen Gathman
http://cstl-csm.semo.edu/gathman

> -----Original Message-----
> From: bioperl-l-bounces at portal.open-bio.org [mailto:bioperl-l-
> bounces at portal.open-bio.org] On Behalf Of Gathman, Allen
> Sent: Tuesday, August 23, 2005 1:39 PM
> To: 'bioperl-l at portal.open-bio.org'
> Subject: [Bioperl-l] Bio::DB::GFF, aggregators, and spliced_seq
> 
> Hi, BioPerl gurus:
> 
> 
> 
> Although this question involves a Gbrowse database, I think it's actually
> a
> BioPerl question at heart - and in any case it appears that there's a lot
> of
> overlap between the people who answer questions in both lists, so I'm
> guessing this is a good place for this question.
> 
> 
> 
> I've written a script that finds particular pfam hits in a GBROWSE
> database,
> then uses "overlapping_features" to find predicted gene features of  type
> "transcript:GLEAN" that overlap those pfams. I've set the aggregator
> "transcript" as {CDS/mRNA} already.  I select features using a regular
> expression to choose particular names, then I use spliced_seq to return
> the
> spliced CDS of each feature - but I'm only getting back the CDS that
> actually overlap the pfam hit, not the full predicted gene.  So my
> question
> is, what do I need to do in order to get ALL the CDS of each predicted
> gene
> feature spliced together, instead of only the ones that actually overlap
> the
> pfam hit I used to select that predicted gene?
> 
> 
> 
> Thanks in advance for any help you can give...
> 
> 
> 
> Here's the code:
> 
> 
> 
> #!/usr/bin/perl
> 
> 
> 
> use strict;
> 
> use Bio::DB::GFF;
> 
> use Bio::Seq;
> 
> use Bio::SeqIO;
> 
> use Getopt::Long;
> 
> 
> 
> my $outfile;
> 
> GetOptions(
> 
>            'o|outfile=s' => \$outfile,
> 
>            );
> 
> 
> 
> my $outfa= Bio::SeqIO -> new (-file => ">$outfile",
> 
>                               -format => 'Fasta'
> 
>                              );
> 
> 
> 
> my $db      = Bio::DB::GFF-> new ( - adaptor => 'dbi::mysql',
> 
>                                -dsn        =>
> 'dbi:mysql:database=cc;host=localhost',
> 
>                                -fasta      => '/gbrowse/databases/cc'
> 
>                                );
> 
> 
> 
> $db->add_aggregator('transcript{CDS/mRNA}');
> 
> 
> 
>      for (my $i =1; $i<=20; $i++){
> 
>       my $pfamname="Peptidase_C$i";
> 
>         my @pfams = $db->get_feature_by_name( Domain => $pfamname);
> 
>         foreach my $pfamhit (@pfams){
> 
>              my $desc = $pfamname;
> 
>              my $score=$pfamhit->score;
> 
>              my $name = $pfamhit->name;
> 
>              $desc.= " $score ";
> 
>              $desc.= $pfamhit->location->seq_id();
> 
>              $desc.= ": ";
> 
> #
> 
> # Here's where I'm selecting predicted genes that overlap the Pfam hit
> 
> #
> 
>              my @genes = $db -> overlapping_features(
> 
>                                    -refseq => $pfamhit->location->seq_id,
> 
>                                    -start => $pfamhit->start,
> 
>                                    -stop => $pfamhit->stop,
> 
>                                    -types =>'transcript:GLEAN'
> 
>                                    );
> 
> #
> 
> # Now I'm choosing the ones with names I want out of the selected genes
> 
> #
> 
>                   foreach my $gene (@genes){
> 
>                        my $gid=$gene->display_id();
> 
>                        if ($gid =~/aug_GLEAN/){
> 
>                             $desc.=$gene->start;
> 
>                             $desc.=" - ";
> 
>                             $desc.=$gene->stop;
> 
> #
> 
> # Here I'm splicing the gene, tacking on a description, and outputting it.
> 
> #
> 
> 
> 
>                             my $splseq = $gene->spliced_seq();
> 
>                             $splseq->desc($desc);
> 
>                             $splseq->display_id($gid);
> 
>                             $outfa->write_seq($splseq);
> 
> 
> 
>                        }# end if aug_GLEAN
> 
>                   }# end foreach gene
> 
>          }# end foreach pfamhit
> 
>    } # end for numbers
> 
> close OUT;
> 
> 
> 
> 
> 
> Allen Gathman
> 
> http://cstl-csm.semo.edu/gathman
> 
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l