[Bioperl-l] `get_feature_by_name` not working after migrating to Bio::DB::SeqFeature::Store from a Bio::DB::GFF backend

Vivek Krishnakumar vivekkrishnakumar at gmail.com
Wed Feb 1 15:14:05 UTC 2012


Hello,

I would like to state my problem: Previously, i was working with a
Bio::DB:GFF backend database and was able to retrieve features from the
database by feature name using the function 'get_feature_by_name()'. But
recently, when I made the switch to Bio::DB::SeqFeature::Store to store the
exact same data points as before in the new db, invoking the
'get_feature_by_name()' function returns nothing.

Please refer to the following code snippet which is supposed to retrieve a
certain gene "locus" feature from the backend database and use that to get
the 'start' and 'end' coordinate of the feature based on the 'strand' on
which it is present.

my ($locus_obj, $gene_models) = get_annotation_db_features($locus,
$gff_dbh);

sub get_annotation_db_features {
    my ($locus, $gff_dbh) = @_;

    my ($locus_obj) = $gff_dbh->get_feature_by_name('Gene' => '$locus');

    my ($end5, $end3) = $locus_obj->strand == 1
      ? ($locus_obj->start, $locus_obj->end)
      : ($locus_obj->end, $locus_obj->start);

    my $segment = $gff_dbh->segment($locus_obj->refseq, $end5, $end3);
    my @gene_models =
      $segment->features('processed_transcript:working_models', -attributes
=> { 'Gene' => $locus });

    #will have to sort the gene models
    return ($locus_obj, \@gene_models);
}

Also, here is a snippet of the GFF3 file that is used to populate the
backend database:
##gff-version 3
chr2    working_models  gene    30427563    30429139    .   -   .
ID=gene_35804;Note=Zinc transporter;Name=Medtr2g097580
chr2    working_models  mRNA    30427563    30429139    .   -   .
ID=mrna_36255;Parent=gene_35804;Name=Medtr2g097580.1;conf_class=F
chr2    working_models  exon    30428491    30429139    .   -   .
ID=exon_120028;Parent=mrna_36255
chr2    working_models  exon    30427563    30428147    .   -   .
ID=exon_120029;Parent=mrna_36255
chr2    working_models  CDS 30428491    30429109    .   -   0
ID=cds_120028;Parent=mrna_36255
chr2    working_models  CDS 30427756    30428147    .   -   2
ID=cds_120029;Parent=mrna_36255

Considering that this gene locus is unique in the entire GFF file, If this
above GFF is loaded into the SeqFeature::Store database, you would expect
that running the following query should yield a count of "1":

SELECT count(f.id) FROM feature as f, name as n WHERE
(n.id=f.id<https://owa.jcvi.org/OWA/redir.aspx?C=6752206bd7374e2483702b789422d7b3&URL=http%3a%2f%2ff.id%2f>
AND
n.name = 'Medtr2g097580' AND n.display_name > 0);

And in my case, it does yield "1".

But if I use the function *get_feature_by_name*, retrieve the locus object
and try to get the strand of this object, I get the following error:

[error] Can't call method "strand" on an undefined value at
get_db_features.pl line 910.

As specified earlier, I do not have any problems if the backend is
Bio::DB::GFF.

>From the SeqFeature::Store documentation, we know that
'get_feature_by_name()' is in place for backward compatibility, I even
tried modifying the code snippet to call 'get_features_by_name()' instead,
but no matter what, I do not get any locus object back from this subroutine!

Could someone please guide me in the right direction and let me know if I
am making any mistakes here when migrating from GFF2 to GFF3?
Thanks in advance.
~ Vivek



More information about the Bioperl-l mailing list