[Bioperl-l] Invalid EMBL files generated in rare circumstances; line wrapping
Fields, Christopher J
cjfields at illinois.edu
Mon Sep 29 15:41:10 UTC 2014
I can reproduce that on master branch. It’s a weird consequence/side-effect of the text wrapping I think; if you remove the space at the end of the string of X’s and allow the module to text wrap the line it works fine. I don’t think we’ve ever run into it frankly.
If possible can you file it as a bug on GitHub?
On Sep 29, 2014, at 10:17 AM, Adam Sjøgren <adsj at novozymes.com> wrote:
> If you craft a tag on a feature sneakily (or if you are unlucky)
> Bio::SeqIO will create invalid EMBL, separating the "/" from the
> qualifier name:
> ID unknown; SV 1; linear; unassigned DNA; STD; UNC; 4 BP.
> AC unknown;
> FH Key Location/Qualifiers
> FT CDS 1..4
> FT /
> FT note="XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
> FT X"
> SQ Sequence 4 BP; 1 A; 1 C; 1 G; 1 T; 0 other;
> actg 4
> In this example "/" and "note" are on separate lines, which is wrong; at
> least BioPerl does not accept it itself.
> Here is a script to create the above output (BioPerl 1.6.901 used):
> use strict;
> use warnings;
> use Bio::Seq::RichSeq;
> use Bio::SeqFeature::Generic;
> use IO::String;
> use Bio::SeqIO;
> my $seq=Bio::Seq::RichSeq->new(-display_id=>'TEST', -seq=>'actg');
> my $cds=Bio::SeqFeature::Generic->new(-primary_tag=>'CDS', -start=>1, -end=>4);
> $cds->add_tag_value(note=>'XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX X');
> my $string;
> my $str=IO::String->new($string);
> my $io=Bio::SeqIO->new(-fh=>$str, -format=>'embl');
> print $string;
> Changing the position of the space in the note makes a/the difference.
> Maybe there is a bug lurking in the line wrapping/formatting code
> Does this sound like a bug to anyone else?
> Best regards,
> Adam Sjøgren
> adsj at novozymes.com
> Bioperl-l mailing list
> Bioperl-l at mailman.open-bio.org
More information about the Bioperl-l