[Bioperl-l] Bio::DB::GFF and GadFly GFF3 issues
Marco Blanchette
mblanche at berkeley.edu
Tue Mar 7 00:02:27 UTC 2006
Lincoln--
I did what you suggested and still can't get a drawing of the exon/intron
mRNA structure using the following script:
#!/usr/bin/perl
use strict;
use warnings;
use Bio::DB::GFF;
use Bio::Graphics;
use Bio::SeqFeature::Generic;
my $dmdb = Bio::DB::GFF->new( -adaptor => 'dbi::mysql',
-dsn => "chr4_mod",
);
my @genes = ('CG2381','CG2041'); ##two genes on the fourth chromosome
foreach my $gene (@genes){
my $tg = $dmdb->segment(-name => $gene);
if ($tg){
my $panel = Bio::Graphics::Panel->new(
-length => $tg->length,
-width => 800,
-pad_left => 10,
-pad_right => 10,
);
my $full_length =
Bio::SeqFeature::Generic->new(-start=>1,-end=>$tg->length);
$panel->add_track($full_length,
-glyph => 'arrow',
-tick => 2,
-fgcolor => 'black',
-double => 1,
);
my $tcs = $tg->features(-types =>'processed_transcript',
-attributes => {Gene=> $gene},
-iterator => 1);
while (my $tc = $tcs->next_seq){
my $track = $panel->add_track( generic => $tc,
-bgcolor => 'blue',
-label => 1,
-bump => 0,
#-connector => 'solid',
);
}
open FH, ">$gene.png" || die "Can't create file $gene.png\n";
print FH $panel->png;
$panel->finished;
close FH;
}
}
I get 2 files with the mRNA tracts labeled with the RNA id but without any
exon/intron structure.
However, If I extract the exons first and draw them as in:
#!/usr/bin/perl
use strict;
use warnings;
use Bio::DB::GFF;
use Bio::Graphics;
use Bio::SeqFeature::Generic;
my $dmdb = Bio::DB::GFF->new( -adaptor => 'dbi::mysql',
-dsn => "chr4_mod",
);
my @genes = ('CG2381','CG2041'); ##two genes on the fourth chromosome
foreach my $gene (@genes){
my $tg = $dmdb->segment(-name => $gene);
if ($tg){
my $panel = Bio::Graphics::Panel->new(
-length => $tg->length,
-width => 800,
-pad_left => 10,
-pad_right => 10,
);
my $full_length =
Bio::SeqFeature::Generic->new(-start=>1,-end=>$tg->length);
$panel->add_track($full_length,
-glyph => 'arrow',
-tick => 2,
-fgcolor => 'black',
-double => 1,
);
my @transcripts = $tg->features(-types =>'processed_transcript',
-attributes => {Gene => $gene},
);
for my $tc (@transcripts){
my @exons = $tc->features( -types => 'exon',
-attributes => {Parent =>
$tc->group});
my @introns = $tc->features(-types => 'intron',
-attributes => {Parent =>
$tc->group});
my $track = $panel->add_track( generic => \@exons,
-bgcolor => 'blue',
-label => 1,
-bump => 0,
);
}
open FH, ">$gene.png" || die "Can't create file $gene.png\n";
print FH $panel->png;
$panel->finished;
close FH;
}
}
I get the exon drawn correctly but I can add line connecting them. Moreover,
since each exon are individual feature, I can't get the mRNA id displayed on
each tract...
I tried to pass the @intron array to the glyph in different ways to draw hat
line between exon without any success.
Am I going the right direction or their is some better workaround?
Many thanks
Marco
On 3/6/06 10:31 AM, "Lincoln Stein" <lstein at cshl.edu> wrote:
> Hi,
>
> Since I wrote the last message I have done some more testing and have
> determined that the flybase GFF3 files cannot be stored in Bio::DB::GFF due
> to limitations in the Bio::DB::GFF data model. The issue is that Bio::DB::GFF
> can only store one level of parentage, and not the two levels needed by
> flybase genes.
>
> Here is a quick fix to preprocess the gff3 files so that they can be used by
> Bio::DB::GFF:
>
> while (<>) {
> my @fields = split "\t";
> next unless $fields[2] eq 'mRNA';
> s/Parent=([^;]+)/Gene=$1/;
> } continue {
> print;
> }
>
> This turns the "Parent" field of mRNA lines into a "Gene" attribute. You can
> then find all transcripts corresponding to a particular gene in much the way
> you tried earlier:
>
> my $tcs = $tg->features(-types =>'processed_transcript',
> -attributes => {Gene=> $gene},
> -iterator => 1);
>
> I am going back to work on Bio::DB::GFF3, which will fix this problem.
>
> Lincoln
>
> On Monday 06 March 2006 00:02, Marco Blanchette wrote:
>> Dear all--
>>
>> I am trying to forge my first bioperl weapons with the
>> Bio::DB::GFF and Bio::Graphics modules. My goal is to display genes with
>> their underlying mRNAs and later on add addition useful info (ie binding
>> site for our preferred proteins).
>>
>> I loaded the GadFly gff3 annotation in a mysql database using
>> bulk_load_gff.pl and I am trying to pass a Bio::SeqFeatureI to the
>> Bio::Graphics::add_feature method.
>>
>> My understanding is that:
>> my $tcs = $tg->features(-types =>'processed_transcript',
>> -attributes => {Parent => $gene},
>> -iterator => 1);
>>
>> Produces a Bio::SeqIO object that can be iterate through the next_seq
>> method to get a Bio::Seq object that could be used to extract a
>> Bio::SeqFeatureI by using the get_SeqFeatures method.
>>
>> Somehow, my script does not produce the expected results. Could somebody
>> put me on back on the right track.
>>
>>
>> #!/usr/bin/perl
>> use strict;
>> use warnings;
>> use Bio::DB::GFF;
>> use Bio::Graphics;
>>
>> my $dmdb = Bio::DB::GFF->new( -adaptor => 'dbi::mysql',
>> -dsn => "chr4",
>> );
>>
>>
>> my @genes = ('CG2041'); ##a gene on the fourth chromosome
>>
>> foreach my $gene (@genes){
>>
>> my $geneseg = $dmdb->segment(-name => $gene, -merge);
>>
>> if ($geneseg){
>>
>> my @tgs = $geneseg->features(-types => 'gene');
>>
>> for my $tg (@tgs){
>>
>> my $length = $tg->length();
>>
>> my $panel = Bio::Graphics::Panel->new(-length => $length, -width
>> => 800);
>>
>> my $track = $panel->add_track( -glyph => 'generic',
>> -label => 1);
>>
>> my $tcs = $tg->features(-types =>'processed_transcript',
>> -attributes => {Parent => $gene},
>> -iterator => 1);
>>
>> while ( my $tc = $tcs->next_seq ){
>> $track->add_feature($tc->get_SeqFeatures);
>> }
>>
>> print $panel->png;
>> }
>> }
>> }
>>
>> Many thanks
>>
>>
>> Marco Blanchette, Ph.D.
>>
>> mblanche at berkeley.edu
>>
>> Donald C. Rio's lab
>> Department of Molecular and Cell Biology
>> 16 Barker Hall
>> University of California
>> Berkeley, CA 94720-3204
>>
>> Tel: (510) 642-1084
>> Cell: (510) 847-0996
>> Fax: (510) 642-6062
>>
>>
>>
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
Marco Blanchette, Ph.D.
mblanche at berkeley.edu
Donald C. Rio's lab
Department of Molecular and Cell Biology
16 Barker Hall
University of California
Berkeley, CA 94720-3204
Tel: (510) 642-1084
Cell: (510) 847-0996
Fax: (510) 642-6062
More information about the Bioperl-l
mailing list