[Bioperl-l] Problem with Bio::DB::GFF

Lincoln Stein lstein at cshl.edu
Fri Apr 8 11:35:54 EDT 2005


Hi Allen,

This is one of the oddities of the GFF2 format (and something that the 
GFF3 format fixes).  The first attribute/value pair in the ninth 
column becomes the primary ID for the feature, so instead of fetching 
by attribute, you must fetch by name:

	my @feature = $db->get_feature_by_name(Gene=>'lcc1');

There may be several similarly-named genes, hence the list result.

Lincoln


On Friday 08 April 2005 10:13 am, Gathman, Allen wrote:
> Hello all -
>
>
>
> Although this problem arises in the context of Gbrowse, I'm pretty
> sure it's more of a straight bioperl question.
>
> I'm trying to pull specific features out of a database that I'm
> using to run Gbrowse, using the tools in Bio::DB::GFF.  Here's a
> test program:
>
>
>
> #!/usr/bin/perl
>
>
>
> use Bio::DB::GFF;
>
> my $db=Bio::DB::GFF->new(-adaptor => 'dbi::mysql',
>
>                          -dsn     =>
> 'dbi:mysql:database=ccsmall;host=localhost',
>
>                          -fasta   => '/gbrowse/databases/cc'
>
>                         );
>
> $outfile= $ARGV[0];
>
> open (OUT, ">$outfile");
>
> @gfeat=$db->features(-attributes => {-Gene => 'lcc1'});
>
> print OUT "Genes: @gfeat\n";
>
> foreach $feat (@gfeat){
>
>     print OUT "Start is " . $feat->start . "\n";
>
>     }
>
> close OUT;
>
>
>
> When I run this script, the output file says:
>
>
>
> Genes:
>
>
>
>
>
> And that's it.  No features are returned.  A gff file loaded into
> the ccsmall database using load_gff.pl has this first line:
>
>
>
> ccin_Contig34     CURATED_PRIVATE   mRNA  64853 68924 .     -     .
>     Gene lcc1; Eval 0.0; Lab Kues
>
>
>
>
> So it would appear that the attribute -Gene => 'lcc1' should pull
> out something - but obviously I'm missing something here, because
> it doesn't.
>
>
>
> I wrote another script that uses the get_seq_stream method, and it
> pulls out this gene, among others.  I've been able to get at the
> 'lcc1' by looking at $f->group for each feature in the stream:
>
>
>
> $gefeat=$db->get_seq_stream(-type => ['mRNA:CURATED_PRIVATE']);
>
> while ($f=$gefeat->next_seq) {
>
>       $group=$f->group;
>
>        if ($group='lcc1') { bla bla bla }
>
> }
>
>
>
> But this is sort of (well, okay, REALLY) clumsy.
>
>
>
> So my question is, what am I doing wrong with the "features"
> method?  I'd really prefer to be able to get at features by name if
> possible.  And for that matter, if I wanted to get at them by Eval
> or Lab, I'm not sure what would do that at all.
>
>
>
> Thanks for any suggestions you can make --
>
>
>
> -Allen
>
>
>
> Allen Gathman
>
> http://cstl-csm.semo.edu/gathman
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l

-- 
Lincoln D. Stein
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724

NOTE: Please copy Sandra Michelsen <michelse at cshl.edu> on
all emails regarding scheduling and other time-critical topics.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://portal.open-bio.org/pipermail/bioperl-l/attachments/20050408/2dfc7357/attachment.bin


More information about the Bioperl-l mailing list