[Bioperl-l] Problem with Bio::DB::GFF
Lincoln Stein
lstein at cshl.edu
Fri Apr 8 11:35:54 EDT 2005
Hi Allen,
This is one of the oddities of the GFF2 format (and something that the
GFF3 format fixes). The first attribute/value pair in the ninth
column becomes the primary ID for the feature, so instead of fetching
by attribute, you must fetch by name:
my @feature = $db->get_feature_by_name(Gene=>'lcc1');
There may be several similarly-named genes, hence the list result.
Lincoln
On Friday 08 April 2005 10:13 am, Gathman, Allen wrote:
> Hello all -
>
>
>
> Although this problem arises in the context of Gbrowse, I'm pretty
> sure it's more of a straight bioperl question.
>
> I'm trying to pull specific features out of a database that I'm
> using to run Gbrowse, using the tools in Bio::DB::GFF. Here's a
> test program:
>
>
>
> #!/usr/bin/perl
>
>
>
> use Bio::DB::GFF;
>
> my $db=Bio::DB::GFF->new(-adaptor => 'dbi::mysql',
>
> -dsn =>
> 'dbi:mysql:database=ccsmall;host=localhost',
>
> -fasta => '/gbrowse/databases/cc'
>
> );
>
> $outfile= $ARGV[0];
>
> open (OUT, ">$outfile");
>
> @gfeat=$db->features(-attributes => {-Gene => 'lcc1'});
>
> print OUT "Genes: @gfeat\n";
>
> foreach $feat (@gfeat){
>
> print OUT "Start is " . $feat->start . "\n";
>
> }
>
> close OUT;
>
>
>
> When I run this script, the output file says:
>
>
>
> Genes:
>
>
>
>
>
> And that's it. No features are returned. A gff file loaded into
> the ccsmall database using load_gff.pl has this first line:
>
>
>
> ccin_Contig34 CURATED_PRIVATE mRNA 64853 68924 . - .
> Gene lcc1; Eval 0.0; Lab Kues
>
>
>
>
> So it would appear that the attribute -Gene => 'lcc1' should pull
> out something - but obviously I'm missing something here, because
> it doesn't.
>
>
>
> I wrote another script that uses the get_seq_stream method, and it
> pulls out this gene, among others. I've been able to get at the
> 'lcc1' by looking at $f->group for each feature in the stream:
>
>
>
> $gefeat=$db->get_seq_stream(-type => ['mRNA:CURATED_PRIVATE']);
>
> while ($f=$gefeat->next_seq) {
>
> $group=$f->group;
>
> if ($group='lcc1') { bla bla bla }
>
> }
>
>
>
> But this is sort of (well, okay, REALLY) clumsy.
>
>
>
> So my question is, what am I doing wrong with the "features"
> method? I'd really prefer to be able to get at features by name if
> possible. And for that matter, if I wanted to get at them by Eval
> or Lab, I'm not sure what would do that at all.
>
>
>
> Thanks for any suggestions you can make --
>
>
>
> -Allen
>
>
>
> Allen Gathman
>
> http://cstl-csm.semo.edu/gathman
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
--
Lincoln D. Stein
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724
NOTE: Please copy Sandra Michelsen <michelse at cshl.edu> on
all emails regarding scheduling and other time-critical topics.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://portal.open-bio.org/pipermail/bioperl-l/attachments/20050408/2dfc7357/attachment.bin
More information about the Bioperl-l
mailing list