[Bioperl-l] Bio::DB::GFF still a nightmare...

Fri Mar 31 00:59:55 UTC 2006

Lincoln--

Sorry for the confusion, I meant to say Bio::DB:GFF. As for the data file, I
have been using a modify version of the GadFly gff3 v4.2.1 annotation where
the Parent tag was modify to Gene using the following script you provided me
with some weeks ago:
    while (<>) {
        my @fields = split "\t";
        next unless $fields[2] eq 'mRNA';
        s/Parent=([^;]+)/Gene=$1/;
    } continue {
        print;
    }

The data file is publicly available using the guest user in the database
dmel_421 at riolab.net. The script I am testing is:

#!/usr/bin/perl
use strict;
use warnings;
use Bio::DB::GFF;
use Bio::Graphics;
use Bio::SeqFeature::Generic;

my $agg1 = Bio::DB::GFF::Aggregator->new(    -method => 'pre_mRNA',
                                            -main_method => 'mRNA',
                                            -sub_parts    =>
['exon','five_prime_UTR','three_prime_UTR'],
                                        );

my $dmdb = Bio::DB::GFF    ->new( -adaptor => 'dbi::mysql',
                          -dsn =>
'dbi:mysql:database=dmel_421;host=riolab.net',
                          -user => 'guest',
                          -aggregators=> [$agg1],
                );

my @genes = qw (CG17800);

for my $gene (@genes){
    my $tg = $dmdb->segment(-name => $gene);

    my @transcripts = $tg->features(-type => 'pre_mRNA',
                                     #-attributes => {Gene => $gene},
                                     );

     for my $tc (@transcripts){
         my %atts = $tc->attributes;
         print "$_ => $atts{$_}\n" foreach (keys %atts);
         print "\n";
     }

    my $panel = Bio::Graphics::Panel->new(
                  -length => $tg->length,
                  -width  => 800,
                  -pad_left => 10,
                  -pad_right => 10,
                 );

    $panel->add_track(processed_transcript=>\@transcripts,
              -label=>1,
              -implied_utrs=>1,
              );

    open FH, ">$gene.png" || die "Can't create file $gene.png\n";
    print "saving $gene.png\n";
    print FH $panel->png;
    $panel->finished;
    close FH;
}

If you run this script as is, you will get part of CG30500 and CG30501
within together with all CG17800 mRNAs. I my understanding of the module was
that by selecting the Gene=>CG_ID attribute in the features methods, I would
trap only the transcript from CG17800 (rerun the script with un-commented
line 25 #-attributes => {Gene => $gene},). Buy doing that the script trap
the right transcripts (as seen in the STDOUT) but looses the 'pre-mRNA'
aggregate... 

As you suggest, should I stop trying and wait for the release of the
Bio::DB::GFF3 module?

Again, real sorry for the confusion.

Marco

On 3/30/06 12:11 PM, "Lincoln Stein" <lstein at cshl.edu> wrote:

> Hi Marco,
> 
> There is no Bio::DB::GFF3 module, and there is no attachment! Send out the
> data file and the script you are using and I'll comment on it.
> 
> Lincoln
> 
> On Wednesday 29 March 2006 21:19, Marco Blanchette wrote:
>> Dear all--
>> 
>> There¹s definitely something I don¹t get with the Bio::DB::GFF3 module...
>> When I run the following script, I get the drawing I want but contaminated
>> with pieces of overlapping genes (see attached CG17800.png_v1). My
>> understanding is that the aggregate pre-mRNAs contain the attribute
>> ŒGene->CG_ID¹ (see the output). So when I uncomment line 25 '-attributes =>
>> {Gene => $gene},' in order to get only the transcript from the queried gene

Marco Blanchette, Ph.D.

mblanche at berkeley.edu

Donald C. Rio's lab
Department of Molecular and Cell Biology
16 Barker Hall
University of California
Berkeley, CA 94720-3204

Tel: (510) 642-1084
Cell: (510) 847-0996
Fax: (510) 642-6062