[Bioperl-l] possible bug printing GenBank feature qualfiers
Scott Markel
smarkel at scitegic.com
Fri Mar 31 20:48:24 UTC 2006
Chris,
I get clean runs with both a Windows build of Perl
This is perl, v5.8.7 built for MSWin32-x86-multi-thread
and a cygwin build
This is perl, v5.8.6 built for cygwin-thread-multi-64int
My test input file should be okay. I started with a valid
file from NCBI and trimmed. misc_feature is allowed to
have qualifiers. To be DDBJ/EMBL/GenBank Feature Table
compliant, we could change "foo" to "note", but I don't think
BioPerl is doing any checks for valid qualifier names.
Scott
Chris Fields wrote:
> Well, I'm running off bioperl-live now using WinXP and latest ActivePerl
> (5.8.8.817) and, although all tests pass (SeqIO and genbank), I keep getting
> the same error and erroneous output using Scott's (albeit very simple)
> sequence example. The problem seems to pop up on the input end, not output
> (the 'no feature key' in the below output only shows up when -verbose is
> turned on with the input SeqIO object).
>
> Questions:
>
> 1) Is the below example Scott gave valid GenBank format? I don't know, but
> it looks okay.
> 2) If so, should it work? Yes, no question.
> 3) And if it is supposed to, why isn't it working here? Don't know, but any
> of the mentioned fixes don't do anything (get rid of the error) for me.
> Scott gets it to work but my guess is that it is b/c he's using a Linux/UNIX
> flavor. Can't wait 'til I get my MacTel (4 more months....)
>
> I'm personally not too worried about it at the moment as anything I passed
> through SeqIO has worked w/o a problem, even on WinXP. It's just a bit
> frustrating to see something fail here that seems to work elsewhere.
>
> So here's what I did:
>
> input.gbff
> ====================================
> LOCUS MY_LOCUS 10 aa linear UNK
> DEFINITION my description.
> ACCESSION 12345
> FEATURES Location/Qualifiers
> misc_feature 1..10
> /foo="0"
> ORIGIN
> 1 atggagaact
> //
> ====================================
>
> Run through this:
> ====================================
> use Bio::SeqIO;
>
> my $inputFilename = "input.gbff";
> my $outputFilename = "output.gbff";
>
> my $in = Bio::SeqIO->new(-verbose => 1,
> -file => $inputFilename,
> -format => "genbank");
>
> my $out = Bio::SeqIO->new(-verbose => 0,
> -file => ">$outputFilename",
> -format => "genbank");
>
> my $sequence = $in->next_seq();
> $out->write_seq($sequence);
> ====================================
>
> Gets this error:
>
> ====================================
> C:\Perl\Scripts\gb_test>test.pl
> no feature key!
>
> -------------------- WARNING ---------------------
> MSG: Unexpected error in feature table for Skipping feature, attempting to
> recover
> STACK Bio::SeqIO::genbank::next_seq
> C:\Perl\src\bioperl\core/Bio\SeqIO\genbank.pm:583
> STACK toplevel C:\Perl\Scripts\gb_test\test.pl:18
> sequence length is 10
> ====================================
>
> And this output, which is missing the feature, somewhat expected judging
> from the error (output.gbff)
>
> ====================================
> LOCUS MY_LOCUS 10 aa linear linear
> DEFINITION my description.
> ACCESSION 12345
> KEYWORDS .
> FEATURES Location/Qualifiers
> ORIGIN
> 1 atggagaact
> //
> ====================================
>
> I wouldn't be a bit surprised if it is a WinXP-specific issue, so I'll give
> it a try this weekend on Mac OS X using the latest CVS to see what happens.
>
>
> Christopher Fields
> Postdoctoral Researcher - Switzer Lab
> Dept. of Biochemistry
> University of Illinois Urbana-Champaign
>
>
>>-----Original Message-----
>>From: drycafe at gmail.com [mailto:drycafe at gmail.com] On Behalf Of Hilmar
>>Lapp
>>Sent: Friday, March 31, 2006 1:51 PM
>>To: Chris Fields
>>Cc: Scott Markel; bioperl-l at lists.open-bio.org
>>Subject: Re: [Bioperl-l] possible bug printing GenBank feature qualfiers
>>
>>The only problem with Heikki's version of the line is that if the
>>value is undefined you get a (ugly) warning from perl stating that you
>>printed an undefined value. Since normally tags should always have a
>>value (even if an empty string) this is a rather theoretical issue.
>>
>> -hilmar
>>
>>On 3/31/06, Chris Fields <cjfields at uiuc.edu> wrote:
>>
>>>Sorry about that; stupid Outlook sent my mail before I had a chance to
>>>finish it up.
>>>
>>>The Bio::Annotation::Simple fix sounds best, but the problem is that CVS
>>>shows a fix on this line by Heikki after 1.5.1 was released:
>>>
>>> fix to allow 0 values despite operator overload (Paul Mooney)
>>>
>>>which changed the overload to:
>>>
>>> use overload '""' => sub { $_[0]->value};
>>>
>>>I'll try out your fix here to see if it breaks anything (can't see why
>>
>>it
>>
>>>would), but I may need to dig through the archives a little to see why
>>
>>this
>>
>>>latest change was made. If everything works and passes tests I'll roll
>>
>>back
>>
>>>the commit I made to Bio::SeqIO::genbank earlier today.
>>>
>>>Christopher Fields
>>>Postdoctoral Researcher - Switzer Lab
>>>Dept. of Biochemistry
>>>University of Illinois Urbana-Champaign
>>>
>>>
>>>
>>>>-----Original Message-----
>>>>From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>>>>bounces at lists.open-bio.org] On Behalf Of Hilmar Lapp
>>>>Sent: Friday, March 31, 2006 12:23 PM
>>>>To: Scott Markel
>>>>Cc: bioperl-l at lists.open-bio.org; Chris Fields
>>>>Subject: Re: [Bioperl-l] possible bug printing GenBank feature
>>
>>qualfiers
>>
>>>>Scott,
>>>>
>>>>your fix assumes that $value in reality is not a scalar but a hash ref
>>>>and that it has a key "value".
>>>>
>>>>Apparently in your test environment this is all indeed true, but there
>>>>is no guarantee that this will still be true tomorrow when you next
>>>>update from CVS (or install a new version).
>>>>
>>>>It seems to me that making feature tag values Bio::AnnotationI objects
>>>>and the stringification overload is what is interfering here. More
>>>>specifically, the broken overload in Bio::Annotation::SimpleValue
>>>>
>>>> use overload '""' => sub { $_[0]->value || ''};
>>>>
>>>>will lead exactly to the behavior you see (b/c $_[0]->value evaluates
>>>>to false if the value is '0').
>>>>
>>>>You say you build and populate the feature dynamically - are you using
>>>>Bio::SeqFeature::Annotated for this? Bio::SeqFeature::Generic is
>>
>>slated
>>
>>>>to get this behavior reverted, i.e., will return to using scalars for
>>>>tag values. (Or so I recall ...)
>>>>
>>>>To fix the problem for you now, I suggest you either fix the overload
>>>>statement above to be
>>>>
>>>> use overload '""' => sub { defined($_[0]->value) ? $_[0]->value
>>
>>: ''
>>
>>>>};
>>>>
>>>>I suppose this should in fact be committed to the repository - does
>>>>anybody see any damage from this change?
>>>>
>>>>Or, if you do want to mess with the GenBank format writer, protect the
>>>>conversion to string and use the object access method:
>>>>
>>>> if (ref($value) && $value->isa("Bio::Annotation::SimpleValue"))
>>
>>{
>>
>>>> # convert SimpleValue object to represented (string)
>>
>>value
>>
>>>> $value = $value->value;
>>>> }
>>>>
>>>>Hth,
>>>>
>>>> -hilmar
>>>>
>>>>On Mar 31, 2006, at 9:31 AM, Scott Markel wrote:
>>>>
>>>>
>>>>>Chris,
>>>>>
>>>>>Looks like I made my test case too simple. In our application,
>>>>>which calls BioPerl, I'm creating the feature with the zero-
>>>>>valued qualifier. It's not being read in from a file, so
>>>>>my only issue is with writing GenBank files. The real feature
>>>>>is one for a primer binding site. The qualifier contains the
>>>>>number of mismatches. The one line change of
>>>>>
>>>>> $value = $value->{"value"}
>>>>>
>>>>>definitely fixes our problem and causes no regression
>>>>>failures in our application.
>>>>>
>>>>>Scott
>>>>>
>>>>>Chris Fields wrote:
>>>>>
>>>>>
>>>>>>I tried this on WinXP (I'm using bioperl-live) and got a warning:
>>>>>>
>>>>>>-------------------- WARNING ---------------------
>>>>>>MSG: Unexpected error in feature table for Skipping feature,
>>>>>>attempting to
>>>>>>recover
>>>>>>---------------------------------------------------
>>>>>>
>>>>>>Running using debugging shows that no feature key was found in
>>>>>>_read_FTHelper_GenBank. So I'm getting an error, but on input not
>>>>>>output.
>>>>>>In fact, turning on -verbose in the SeqIO input object gives the
>>>>>>below extra
>>>>>>output, whereas turning -verbose on only in the output object just
>>>>>>gives the
>>>>>>warning above.
>>>>>>
>>>>>>====================================
>>>>>>C:\Perl\Scripts\gb_test>test.pl
>>>>>>no feature key!
>>>>>>
>>>>>>-------------------- WARNING ---------------------
>>>>>>MSG: Unexpected error in feature table for Skipping feature,
>>>>>>attempting to
>>>>>>recover
>>>>>>STACK Bio::SeqIO::genbank::next_seq
>>>>>>C:\Perl\src\bioperl\core/Bio\SeqIO\genbank.pm:583
>>>>>>STACK toplevel C:\Perl\Scripts\gb_test\test.pl:18
>>>>>>sequence length is 10
>>>>>>====================================
>>>>>>
>>>>>>The sequence came back w/o any features in the feature table, which
>>>>>>is what
>>>>>>I would expect from this error:
>>>>>>====================================
>>>>>>LOCUS MY_LOCUS 10 aa linear
>>
>>linear
>>
>>>>>>DEFINITION my description.
>>>>>>ACCESSION 12345
>>>>>>KEYWORDS .
>>>>>>FEATURES Location/Qualifiers
>>>>>>ORIGIN
>>>>>> 1 atggagaact
>>>>>>//
>>>>>>====================================
>>>>>>
>>>>>>Adding the extra line before the s/// didn't help any (warning
>>
>>still
>>
>>>>>>pops
>>>>>>up, no change in output). Anybody out there with any ideas?
>>>>>>
>>>>>>Christopher Fields
>>>>>>Postdoctoral Researcher - Switzer Lab
>>>>>>Dept. of Biochemistry
>>>>>>University of Illinois Urbana-Champaign
>>>>>>
>>>>>>
>>>>>>
--
Scott Markel, Ph.D.
Principal Bioinformatics Architect email: smarkel at scitegic.com
SciTegic Inc. mobile: +1 858 205 3653
9665 Chesapeake Drive, Suite 401 voice: +1 858 279 8800, ext. 253
San Diego, CA 92123 fax: +1 858 279 8804
USA web: http://www.scitegic.com
More information about the Bioperl-l
mailing list