[Bioperl-l] Bio::DB::GFF3 nightmare
Lincoln Stein
lstein at cshl.edu
Thu Mar 30 23:40:14 UTC 2006
If you insist on using the Bio::DB::GFF database with GFF3 data, you must that
the segment method returns a segment of the genome corresponding to the start
and end positions of the indicated feature. When you call features() it
returns everything that overlaps the region.
The API to use to get a single feature is get_feature_by_name():
my $transcript = $dmdb->get_feature_by_name('CG30501-RA');
my @exons = $transcript->get_SeqFeatures;
Or, if you want all transcripts in the gene CG17800:
my @transcripts = $dmdb->get_feature_by_attribute(Gene=>'CG17800');
for my $transcript (@transcripts) {
my @exons = $transcript->get_SeqFeatures;
}
The latter depends on your having changed the gene Parent attribute into a
Gene attribute as recommended in my previous email.
Lincoln
On Wednesday 29 March 2006 18:46, Marco Blanchette wrote:
> Dear all--
>
> I have been trying to display exon/intron structure of mRNAs for a given
> gene the D. melanogaster GadFly GFF3 annotation 4.2.1 loaded into mySQL
> using bp_bulk_loadd_gff.pl. I keep getting mRNAs from other genes that
> fall within the segment of the queried gene. For example:
>
> #!/usr/bin/perl
> use strict;
> use warnings;
> use Bio::DB::GFF;
>
> my $agg1 = Bio::DB::GFF::Aggregator->new( -method => 'pre_mRNA',
> -sub_parts =>
> ['exon','five_prime_UTR','three_prime_UTR'],
> );
>
> my $dmdb = Bio::DB::GFF ->new( -adaptor => 'dbi::mysql',
> -dsn =>
> 'dbi:mysql:database=dmel_421;host=riolab.net',
> -user => 'guest',
> -aggregators=> [$agg1],
> );
>
> my @genes = qw (CG17800);
>
> for my $gene (@genes){
> my $tg = $dmdb->segment(-name => $gene);
>
> my @transcripts = $tg->features(-type => 'pre_mRNA',
> );
>
> for my $tc (@transcripts){
> my %atts = $tc->attributes;
> print "$_ => $atts{$_}\n" foreach (keys %atts);
> }
> }
>
> This script generate the output:
> Parent => CG30501-RA
> Name => Dscam:23
> Parent => CG17800-RE
> Parent => CG30500-RA
> Name => Dscam:23
> Parent => CG17800-RE
> Name => Dscam:23
> Parent => CG17800-RE
> Name => Dscam:23
> Parent => CG17800-RE
> Name => Dscam:23
> Parent => CG17800-RE
>
> Where Neither CG30501-RA nor CG30500-RA are coming from the gene CG17800.
> If I pass @transcripts to a Bio::Graphics::Panel object, I get, of course,
> all the different mRNAs even the one that don¹t belong to the CG17800 gene.
>
> I just can¹t figure out how to restrict the $tg->feature() call to the
> queried gene (ie CG17800)
>
> Many thanks
> ______________________________
> Marco Blanchette, Ph.D.
>
> mblanche at uclink.berkeley.edu
>
> Donald C. Rio's lab
> Department of Molecular and Cell Biology
> 16 Barker Hall
> University of California
> Berkeley, CA 94720-3204
>
> Tel: (510) 642-1084
> Cell: (510) 847-0996
> Fax: (510) 642-6062
--
Lincoln D. Stein
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724
(516) 367-8380 (voice)
(516) 367-8389 (fax)
FOR URGENT MESSAGES & SCHEDULING,
PLEASE CONTACT MY ASSISTANT,
SANDRA MICHELSEN, AT michelse at cshl.edu
More information about the Bioperl-l
mailing list