[Bioperl-l] `get_feature_by_name` not working after migrating to Bio::DB::SeqFeature::Store from a Bio::DB::GFF backend

Vivek Krishnakumar vivekkrishnakumar at gmail.com
Wed Feb 1 17:37:47 UTC 2012


Hi Scott and Lincoln,

Thanks for all your suggestions. I figured out the mistake I was making.

Apart from switching to get_features_by_name(), I also need to switch from
$locus_obj->refseq to $locus_obj->seq_id to get the reference sequence
identifier. This was interfering with the $gff_dbh->segment() call.

Also, it looks like I do not need to embed the 'Gene' class in the GFF file
anymore.

Thanks again.
Regards,
~Vivek

On Wed, Feb 1, 2012 at 11:51 AM, Vivek Krishnakumar <
vivekkrishnakumar at gmail.com> wrote:

> Hi Scott,
>
> Thanks very much for your suggestions. Looks I did miss it somehow
> (confusion was caused because I was using both bioperl-l at googlegroups and
> bioperl-l at open-bio)
>
> Anyway, I had modified my function exactly like your suggestion:
>
> my ($locus_obj) = $gff_dbh->get_features_by_name(-name => $locus, -type =>
> 'gene');
>
> But doing so just returns the following error:
>
> -------------------- EXCEPTION --------------------
> MSG: segment() called in a scalar context but multiple features match.
> Either call in a list context or narrow your search using the -types or -class arguments
>
> STACK Bio::DB::SeqFeature::Store::segment /usr/local/packages/perl-5.10.1/lib/5.10.1/Bio/DB/SeqFeature/Store.pm:1322
> STACK main::get_annotation_db_features /opt/www/medicago/cgi-bin/medicago/eucap/eucap.pl:899
> STACK main::structural_annotation /opt/www/medicago/cgi-bin/medicago/eucap/eucap.pl:660
> STACK toplevel /opt/www/medicago/cgi-bin/medicago/eucap/eucap.pl:119
> -------------------------------------------
>
> which would suggest to oneself that there are several such features with
> the same ID. But in fact, I was able to verify by querying the database
> that I have only one such locus.
>
> As for your question regarding how $locus is populated, it is populated
> from a CGI parameter passed to the script. I know that I am only passing it
> one locus ID. And as I mentioned earlier in this thread, the warning
> statement I inserted before making the function call shows me that there is
> only one ID in the $locus variable.
>
> My last resort now is to try as you suggested and modify my GFF3 file and
> embed the -class => 'Gene' into the "Name" attribute. While doing so,
> should I also embed the 'mRNA' class into the "Name" attribute of the mRNA
> feature like so:
>
> chr2    working_models  mRNA    30427563    30429139    .   -   .
> ID=mrna_36255;Parent=gene_35804;Name=mRNA:Medtr2g097580.1;conf_class=F
>
> Subsequently, should I modify the function call to include the 'class':
> my ($locus_obj) = $gff_dbh->get_features_by_name(-name => $locus, -type =>
> 'gene', -class => 'Gene');
>
> Thank you.
> Vivek
>
>
> On Wed, Feb 1, 2012 at 11:26 AM, Scott Cain <scott at scottcain.net> wrote:
>
>> Hi Vivek,
>>
>> I responded to your original email and I suspect you may have missed
>> it.  I'll copy it below.  Another few things: how does $locus get
>> populated?  Are you sure what you expect to be there is?
>>
>> Also, to answer your question about the other bioperl modules you're
>> using: no, I don't think that's interfering.
>>
>> Scott
>>
>> ------------------------------------------------
>> Hello Vivek,
>>
>> In your GFF3, you don't have any features of class "Gene".  In GFF2,
>> the class was the text string that started the ninth column, like
>> this:
>>
>> chr2   .  gene    30427563    30429139    .   -   .  Gene 35804
>>
>> where the class would be Gene and the name (also called group) would
>> be 35804.  Class is not a particularly well defined concept in GFF3,
>> so the easiest way to restore functionality to your script is to
>> change the call from this:
>>
>>  my ($locus_obj) = $gff_dbh->get_feature_by_name('Gene' => $locus);
>>
>> to this:
>>
>>  my ($locus_obj) = $gff_dbh->get_features_by_name(-name => $locus,
>> -type => 'gene');
>>
>> I believe (though haven't tested it myself in a very long time) that
>> you can embed class in the name of the feature, like this:
>>
>> chr2  .  gene    30427563    30429139    .   -   .
>>  Name=Gene:Medtr2g097580
>>
>> which may or may not be easier, depending on your data and your code base.
>>
>> Scott
>>
>>
>> On Wed, Feb 1, 2012 at 10:50 AM, Vivek Krishnakumar
>> <vivekkrishnakumar at gmail.com> wrote:
>> > Hi Lincoln,
>> >
>> > Thanks very much for your suggestions. Not sure how the single quotation
>> > marks appeared around the $locus variable. But looks like it was only in
>> > the email. Fortunately did not have quotes around the variable in my
>> > original code.
>> >
>> > Now, when I switch over to 'get_features_by_name()', my script does not
>> run
>> > to completion.
>> >
>> > I want to mention that this snippet of code is part of a larger CGI
>> script
>> > that interfaces with the SeqFeature backend DB. When I modify the
>> function
>> > call to $gff_dbh->get_features_by_name($locus), the script just runs
>> > indefinitely and returns absolutely nothing. I did put in a warn
>> statement
>> > to see if the correct locus ID is being passed to the function (I am
>> able
>> > to see the warning message in my apache error log), which seems to be
>> fine.
>> > But the moment it reaches the function call step, the CGI script
>> freezes up
>> > and I am unable to do anything. It just ends up as a rogue process
>> owned by
>> > the 'daemon' user and continues to use up a lot of memory.
>> >
>> > I am using the following BioPerl modules in this CGI script:
>> > use Bio::SeqIO;
>> > use Bio::SearchIO;
>> > use Bio::DB::SeqFeature::Store;
>> > use Bio::SeqFeature::Generic;
>> > use Bio::Graphics;
>> > use Bio::Graphics::Feature;
>> >
>> > Could any of these be interfering with get_features_by_name()?
>> >
>> > Thank you.
>> > Vivek
>> > _______________________________________________
>> > Bioperl-l mailing list
>> > Bioperl-l at lists.open-bio.org
>> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>>
>>
>> --
>> ------------------------------------------------------------------------
>> Scott Cain, Ph. D.                                   scott at scottcain
>> dot net
>> GMOD Coordinator (http://gmod.org/)                     216-392-3087
>> Ontario Institute for Cancer Research
>>
>
>



More information about the Bioperl-l mailing list