[Bioperl-l] Bio::DB::GFF3 nightmare

Lincoln Stein lstein at cshl.edu
Thu Mar 30 20:41:39 UTC 2006


The Bio::DB::GFF module does not correctly support GFF3 format because it only 
handles one level of containment rather than two. You will have to filter out 
the overlapping features. 

Alternatively, wait another two weeks or so and the new Bio::DB::SeqFeature 
module will support GFF3 correctly. If you like, I can send you something 
that will work, but you will have to rewrite your scripts when the API is 
modified.

Lincoln

On Wednesday 29 March 2006 18:46, Marco Blanchette wrote:
> Dear all--
>
> I have been trying to display exon/intron structure of mRNAs for a given
> gene the D. melanogaster GadFly GFF3 annotation 4.2.1 loaded into mySQL
> using bp_bulk_loadd_gff.pl. I keep getting  mRNAs from other genes that
> fall within the segment of the queried gene. For example:
>
> #!/usr/bin/perl
> use strict;
> use warnings;
> use Bio::DB::GFF;
>
> my $agg1 = Bio::DB::GFF::Aggregator->new(    -method => 'pre_mRNA',
>                                             -sub_parts    =>
> ['exon','five_prime_UTR','three_prime_UTR'],
>                                         );
>
> my $dmdb = Bio::DB::GFF    ->new( -adaptor => 'dbi::mysql',
>                           -dsn =>
> 'dbi:mysql:database=dmel_421;host=riolab.net',
>                           -user => 'guest',
>                           -aggregators=> [$agg1],
>                 );
>
> my @genes = qw (CG17800);
>
> for my $gene (@genes){
>     my $tg = $dmdb->segment(-name => $gene);
>
>     my @transcripts = $tg->features(-type => 'pre_mRNA',
>                                      );
>
>      for my $tc (@transcripts){
>          my %atts = $tc->attributes;
>          print "$_ => $atts{$_}\n" foreach (keys %atts);
>      }
> }
>
> This script generate the output:
> Parent => CG30501-RA
> Name => Dscam:23
> Parent => CG17800-RE
> Parent => CG30500-RA
> Name => Dscam:23
> Parent => CG17800-RE
> Name => Dscam:23
> Parent => CG17800-RE
> Name => Dscam:23
> Parent => CG17800-RE
> Name => Dscam:23
> Parent => CG17800-RE
>
> Where Neither CG30501-RA nor CG30500-RA are coming from the gene CG17800.
> If I pass @transcripts to a Bio::Graphics::Panel object, I get, of course,
> all the different mRNAs even the one that don¹t belong to the CG17800 gene.
>
> I just can¹t figure out how to restrict the $tg->feature() call to the
> queried gene (ie CG17800)
>
> Many thanks
> ______________________________
> Marco Blanchette, Ph.D.
>
> mblanche at uclink.berkeley.edu
>
> Donald C. Rio's lab
> Department of Molecular and Cell Biology
> 16 Barker Hall
> University of California
> Berkeley, CA 94720-3204
>
> Tel: (510) 642-1084
> Cell: (510) 847-0996
> Fax: (510) 642-6062

-- 
Lincoln D. Stein
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724
(516) 367-8380 (voice)
(516) 367-8389 (fax)
FOR URGENT MESSAGES & SCHEDULING, 
PLEASE CONTACT MY ASSISTANT, 
SANDRA MICHELSEN, AT michelse at cshl.edu




More information about the Bioperl-l mailing list