[Bioperl-l] Bio::DB::GFF3 nightmare
Lincoln Stein
lstein at cshl.edu
Thu Mar 30 20:41:39 UTC 2006
The Bio::DB::GFF module does not correctly support GFF3 format because it only
handles one level of containment rather than two. You will have to filter out
the overlapping features.
Alternatively, wait another two weeks or so and the new Bio::DB::SeqFeature
module will support GFF3 correctly. If you like, I can send you something
that will work, but you will have to rewrite your scripts when the API is
modified.
Lincoln
On Wednesday 29 March 2006 18:46, Marco Blanchette wrote:
> Dear all--
>
> I have been trying to display exon/intron structure of mRNAs for a given
> gene the D. melanogaster GadFly GFF3 annotation 4.2.1 loaded into mySQL
> using bp_bulk_loadd_gff.pl. I keep getting mRNAs from other genes that
> fall within the segment of the queried gene. For example:
>
> #!/usr/bin/perl
> use strict;
> use warnings;
> use Bio::DB::GFF;
>
> my $agg1 = Bio::DB::GFF::Aggregator->new( -method => 'pre_mRNA',
> -sub_parts =>
> ['exon','five_prime_UTR','three_prime_UTR'],
> );
>
> my $dmdb = Bio::DB::GFF ->new( -adaptor => 'dbi::mysql',
> -dsn =>
> 'dbi:mysql:database=dmel_421;host=riolab.net',
> -user => 'guest',
> -aggregators=> [$agg1],
> );
>
> my @genes = qw (CG17800);
>
> for my $gene (@genes){
> my $tg = $dmdb->segment(-name => $gene);
>
> my @transcripts = $tg->features(-type => 'pre_mRNA',
> );
>
> for my $tc (@transcripts){
> my %atts = $tc->attributes;
> print "$_ => $atts{$_}\n" foreach (keys %atts);
> }
> }
>
> This script generate the output:
> Parent => CG30501-RA
> Name => Dscam:23
> Parent => CG17800-RE
> Parent => CG30500-RA
> Name => Dscam:23
> Parent => CG17800-RE
> Name => Dscam:23
> Parent => CG17800-RE
> Name => Dscam:23
> Parent => CG17800-RE
> Name => Dscam:23
> Parent => CG17800-RE
>
> Where Neither CG30501-RA nor CG30500-RA are coming from the gene CG17800.
> If I pass @transcripts to a Bio::Graphics::Panel object, I get, of course,
> all the different mRNAs even the one that don¹t belong to the CG17800 gene.
>
> I just can¹t figure out how to restrict the $tg->feature() call to the
> queried gene (ie CG17800)
>
> Many thanks
> ______________________________
> Marco Blanchette, Ph.D.
>
> mblanche at uclink.berkeley.edu
>
> Donald C. Rio's lab
> Department of Molecular and Cell Biology
> 16 Barker Hall
> University of California
> Berkeley, CA 94720-3204
>
> Tel: (510) 642-1084
> Cell: (510) 847-0996
> Fax: (510) 642-6062
--
Lincoln D. Stein
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724
(516) 367-8380 (voice)
(516) 367-8389 (fax)
FOR URGENT MESSAGES & SCHEDULING,
PLEASE CONTACT MY ASSISTANT,
SANDRA MICHELSEN, AT michelse at cshl.edu
More information about the Bioperl-l
mailing list