[Bioperl-l] Bio::DB::GFF and GadFly GFF3 issues

Marco Blanchette mblanche at berkeley.edu
Tue Mar 7 00:02:27 UTC 2006


Lincoln--

I did what you suggested and still can't get a drawing of the exon/intron
mRNA structure using the following script:

#!/usr/bin/perl
use strict;
use warnings;
use Bio::DB::GFF;
use Bio::Graphics;
use Bio::SeqFeature::Generic;

my $dmdb = Bio::DB::GFF->new( -adaptor => 'dbi::mysql',
                                   -dsn => "chr4_mod",
                                   );
my @genes = ('CG2381','CG2041'); ##two genes on the fourth chromosome
foreach my $gene (@genes){
    
    my $tg = $dmdb->segment(-name => $gene);
    
    if ($tg){
        
        my $panel = Bio::Graphics::Panel->new(
                        -length => $tg->length,
                        -width  => 800,
                        -pad_left => 10,
                        -pad_right => 10,
                        );
                   
        my $full_length =
Bio::SeqFeature::Generic->new(-start=>1,-end=>$tg->length);
    
        $panel->add_track($full_length,
                            -glyph   => 'arrow',
                            -tick    => 2,
                            -fgcolor => 'black',
                            -double  => 1,
                            );
                   
        my $tcs = $tg->features(-types =>'processed_transcript',
                                -attributes => {Gene=> $gene},
                                -iterator => 1);
        
        while (my $tc = $tcs->next_seq){
            
            my $track = $panel->add_track(    generic => $tc,
                                            -bgcolor => 'blue',
                                            -label  => 1,
                                            -bump => 0,
                                            #-connector => 'solid',
                                            );
        }
        open FH, ">$gene.png" || die "Can't create file $gene.png\n";
        print FH $panel->png;
        $panel->finished;
        close FH;
    }
}

I get 2 files with the mRNA tracts labeled with the RNA id but without any
exon/intron structure.

However, If I extract the exons first and draw them as in:

#!/usr/bin/perl
use strict;
use warnings;
use Bio::DB::GFF;
use Bio::Graphics;
use Bio::SeqFeature::Generic;

my $dmdb = Bio::DB::GFF->new( -adaptor => 'dbi::mysql',
                                   -dsn => "chr4_mod",
                                   );
my @genes = ('CG2381','CG2041'); ##two genes on the fourth chromosome
foreach my $gene (@genes){
    
    my $tg = $dmdb->segment(-name => $gene);
    
    if ($tg){
        my $panel = Bio::Graphics::Panel->new(
                        -length => $tg->length,
                        -width  => 800,
                        -pad_left => 10,
                        -pad_right => 10,
                        );
                   
        my $full_length =
Bio::SeqFeature::Generic->new(-start=>1,-end=>$tg->length);
    
        $panel->add_track($full_length,
                            -glyph   => 'arrow',
                            -tick    => 2,
                            -fgcolor => 'black',
                            -double  => 1,
                            );
        
        my @transcripts = $tg->features(-types =>'processed_transcript',
                                        -attributes => {Gene => $gene},
                                        );
    
        for my $tc (@transcripts){
            my @exons = $tc->features(    -types => 'exon',
                                        -attributes => {Parent =>
$tc->group});
                   
            my @introns = $tc->features(-types => 'intron',
                                        -attributes => {Parent =>
$tc->group});
            
            my $track = $panel->add_track(    generic => \@exons,
                                            -bgcolor => 'blue',
                                            -label  => 1,
                                            -bump => 0,
                                            );
        }
        open FH, ">$gene.png" || die "Can't create file $gene.png\n";
        print FH $panel->png;
        $panel->finished;
        close FH;
    }
}

I get the exon drawn correctly but I can add line connecting them. Moreover,
since each exon are individual feature, I can't get the mRNA id displayed on
each tract...

I tried to pass the @intron array to the glyph in different ways to draw hat
line between exon without any success.

Am I going the right direction or their is some better workaround?

Many thanks

Marco


On 3/6/06 10:31 AM, "Lincoln Stein" <lstein at cshl.edu> wrote:

> Hi,
> 
> Since I wrote the last message I have done some more testing and have
> determined that the flybase GFF3 files cannot be stored in Bio::DB::GFF due
> to limitations in the Bio::DB::GFF data model. The issue is that Bio::DB::GFF
> can only store one level of parentage, and not the two levels needed by
> flybase genes.
> 
> Here is a quick fix to preprocess the gff3 files so that they can be used by
> Bio::DB::GFF:
> 
> while (<>) {
> my @fields = split "\t";
> next unless $fields[2] eq 'mRNA';
> s/Parent=([^;]+)/Gene=$1/;
> } continue {
> print;
> }
> 
> This turns the "Parent" field of mRNA lines into a "Gene" attribute. You can
> then find all transcripts corresponding to a particular gene in much the way
> you tried earlier:
> 
>  my $tcs = $tg->features(-types =>'processed_transcript',
>                                          -attributes => {Gene=> $gene},
>                                          -iterator => 1);
> 
> I am going back to work on Bio::DB::GFF3, which will fix this problem.
> 
> Lincoln
> 
> On Monday 06 March 2006 00:02, Marco Blanchette wrote:
>> Dear all--
>> 
>> I am trying to forge my first bioperl weapons with the
>> Bio::DB::GFF and Bio::Graphics modules. My goal is to display genes with
>> their underlying mRNAs and later on add addition useful info (ie binding
>> site for our preferred proteins).
>> 
>> I loaded the GadFly gff3 annotation in a mysql database using
>> bulk_load_gff.pl and I am trying to pass a Bio::SeqFeatureI to the
>> Bio::Graphics::add_feature method.
>> 
>> My understanding is that:
>> my $tcs = $tg->features(-types =>'processed_transcript',
>>                                         -attributes => {Parent => $gene},
>>                                         -iterator => 1);
>> 
>> Produces a Bio::SeqIO object that can be iterate through the next_seq
>> method to get a Bio::Seq object that could be used to extract a
>> Bio::SeqFeatureI by using the get_SeqFeatures method.
>> 
>> Somehow, my script does not produce the expected results. Could somebody
>> put me on back on the right track.
>> 
>> 
>> #!/usr/bin/perl
>> use strict;
>> use warnings;
>> use Bio::DB::GFF;
>> use Bio::Graphics;
>> 
>> my $dmdb = Bio::DB::GFF->new( -adaptor => 'dbi::mysql',
>>                                    -dsn => "chr4",
>>                                    );
>> 
>> 
>> my @genes = ('CG2041'); ##a gene on the fourth chromosome
>> 
>> foreach my $gene (@genes){
>> 
>>     my $geneseg = $dmdb->segment(-name => $gene, -merge);
>> 
>>     if ($geneseg){
>> 
>>     my @tgs = $geneseg->features(-types => 'gene');
>> 
>>     for my $tg (@tgs){
>> 
>>         my $length = $tg->length();
>> 
>>         my $panel = Bio::Graphics::Panel->new(-length => $length, -width
>> => 800);
>> 
>>         my $track = $panel->add_track(    -glyph => 'generic',
>>                                         -label  => 1);
>> 
>>         my $tcs = $tg->features(-types =>'processed_transcript',
>>                                         -attributes => {Parent => $gene},
>>                                         -iterator => 1);
>> 
>>         while ( my $tc = $tcs->next_seq ){
>>             $track->add_feature($tc->get_SeqFeatures);
>>         }
>> 
>>         print $panel->png;
>>     }
>> }
>> }
>> 
>> Many thanks
>> 
>> 
>> Marco Blanchette, Ph.D.
>> 
>> mblanche at berkeley.edu
>> 
>> Donald C. Rio's lab
>> Department of Molecular and Cell Biology
>> 16 Barker Hall
>> University of California
>> Berkeley, CA 94720-3204
>> 
>> Tel: (510) 642-1084
>> Cell: (510) 847-0996
>> Fax: (510) 642-6062
>> 
>> 
>> 
>> 
>> 
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l


Marco Blanchette, Ph.D.

mblanche at berkeley.edu

Donald C. Rio's lab
Department of Molecular and Cell Biology
16 Barker Hall
University of California
Berkeley, CA 94720-3204

Tel: (510) 642-1084
Cell: (510) 847-0996
Fax: (510) 642-6062








More information about the Bioperl-l mailing list