[Bioperl-l] Long /labels are wrapped, but can't be read
Adam Sjøgren
adsj at novozymes.com
Mon Sep 28 07:51:15 UTC 2009
Hi.
I am wondering whether this is a buglet or just a case of "Don't do
that":
If I set a very long /label on a feature and output the sequence in EMBL
format, the qualifier value gets wrapped, but not quoted.
When BioPerl reads such a file, an exception is thrown.
I probably shouldn't be setting very long labels... But oughtn't BioPerl
throw an exception when a too long label is set, or automatically quote
the value when it is long enough to be wrapped, or know how to read a
wrapped yet unquoted value?
I will be happy to try and provide a patch for whichever solution is
preferred.
Here is an example script:
#!/usr/bin/perl
use strict;
use warnings;
use IO::String;
use Bio::Seq;
use Bio::SeqFeature::Generic;
use Bio::SeqIO;
print 'BioPerl ' . $Bio::Root::Version::VERSION . "\n";
my $seq=Bio::Seq->new(-seq=>'ATG');
my $feature=Bio::SeqFeature::Generic->new(-primary=>'misc_feature', -start=>1, -end=>3);
$feature->add_tag_value(label=>'averylonglabelthisisindeedbutitoughttoworkanywaydontyouthink');
$seq->add_SeqFeature($feature);
my $out_string=out($seq);
print $out_string;
my $fh=IO::String->new($out_string);
my $in=Bio::SeqIO->new(-fh=>$fh, -format=>'EMBL');
my $in_seq=$in->next_seq;
print "Done\n";
sub out {
my ($seq)=@_;
my $string='';
my $fh=IO::String->new($string);
my $out=Bio::SeqIO->new(-fh=>$fh, -format=>'EMBL');
$out->write_seq($seq);
return $string;
}
Which gives this output when run:
BioPerl 1.0069
ID unknown; SV 1; linear; unassigned DNA; STD; UNC; 3 BP.
XX
AC unknown;
XX
XX
FH Key Location/Qualifiers
FH
FT misc_feature 1..3
FT /label=averylonglabelthisisindeedbutitoughttoworkanywaydont
FT youthink
XX
SQ Sequence 3 BP; 1 A; 0 C; 1 G; 1 T; 0 other;
atg 3
//
------------- EXCEPTION: Bio::Root::Exception -------------
MSG: Can't see new qualifier in: youthink
from:
/label=averylonglabelthisisindeedbutitoughttoworkanywaydont
youthink
STACK: Error::throw
STACK: Bio::Root::Root::throw Bio/Root/Root.pm:368
STACK: Bio::SeqIO::embl::_read_FTHelper_EMBL Bio/SeqIO/embl.pm:1294
STACK: Bio::SeqIO::embl::next_seq Bio/SeqIO/embl.pm:392
STACK: /z/home/adsj/bugs/bioperl/embl/embl.pl:24
-----------------------------------------------------------
If I change the value to include "-quotes ("simulating" that embl.pm
quotes the value), BioPerl can read the EMBL string it produces fine:
-----------------------------------------------------------
adsj at ala:~/work/bioperl/bioperl-live$ perl -I. ~/bugs/bioperl/embl/embl.pl
BioPerl 1.0069
ID unknown; SV 1; linear; unassigned DNA; STD; UNC; 3 BP.
XX
AC unknown;
XX
XX
FH Key Location/Qualifiers
FH
FT misc_feature 1..3
FT /label=""averylonglabelthisisindeedbutitoughttoworkanywaydo
FT ntyouthink""
XX
SQ Sequence 3 BP; 1 A; 0 C; 1 G; 1 T; 0 other;
atg 3
//
Done
Best regards,
Adam
--
Adam Sjøgren
adsj at novozymes.com
More information about the Bioperl-l
mailing list