[Bioperl-l] Bio::DB::SeqFeature to GFF mishandles attributes with multiple values
Lincoln Stein
lstein at cshl.edu
Fri Feb 23 17:16:01 UTC 2007
Hi Malcom,
You're quite right, and I appreciate your work in tracking down and fixing
it. Before you commit the patch, can you confirm that the loader is working
correctly so that comma-separated values are read back into the data
structure as multiple attributes?
Lincoln
On 2/23/07, Cook, Malcolm <MEC at stowers-institute.org> wrote:
>
> Lincoln, and other Bio::DB::SeqFeature wanderers:
>
> I find that generating GFF from a Bio::DB::SeqFeature using gff3_string
> does not respect the following:
>
> "Multiple attributes of the same type are indicated by separating the
> values with the comma "," character" (c.f.
> http://www.sequenceontology.org/gff3.shtml)
>
> This one-liner demonstrates the problem:
>
> perl -MBio::DB::SeqFeature -e 'print Bio::DB::SeqFeature->new(-seq_id =>
> "J", -start => 1, -end => 2, -primary_tag => 'PH', -source => 'A',
> -name => 'mec', -attributes => {foo => [qw(bar blat)]})->gff3_string'
> J A PH 1 2 . . .
> foo=bar;foo=blat;Name=mec
>
> Do you agree this is a problem?
>
> The fix is in the post-sig patch to
> /Bio/DB/SeqFeature/NormalizedFeature.pm, in which I also took the
> stylistic privilege of promoting any ID, Parent, or Name attribute to
> the front of column 9, so output is now:
>
> J A PH 1 2 . . .
> Name=mec;foo=bar,blat
>
> Do you agree this is better?
>
> I am poised to commit it, as well as the functionally same patch to the
> equivilent function in Bio/Graphics/FeatureBase.pm
>
> All clear?
>
> -- Malcolm Cook
>
>
>
> *** NormalizedFeature.pm 2 Feb 2007 21:05:42 -0000 1.25
> --- NormalizedFeature.pm 23 Feb 2007 15:37:01 -0000
> ***************
> *** 481,494 ****
> next if $t eq 'load_id';
> next if $t eq 'parent_id';
> foreach (@values) { s/\s+$// } # get rid of trailing whitespace
> !
> ! push @result,join '=',$self->escape($t),$self->escape($_) foreach
> @values;
> }
> my $id = $self->primary_id;
> my $name = $self->display_name;
> ! push @result,"ID=".$self->escape($id) if defined
> $id;
> ! push @result,"Parent=".$self->escape($parent->primary_id) if defined
> $parent;
> ! push @result,"Name=".$self->escape($name) if
> defined $name;
> return join ';', at result;
> }
>
> --- 481,498 ----
> next if $t eq 'load_id';
> next if $t eq 'parent_id';
> foreach (@values) { s/\s+$// } # get rid of trailing whitespace
> !
> ! push @result,join '=',$self->escape($t),$self->escape($_) foreach
> @values;
> ! # NO! Multiple attributes of the same type are indicated by
> ! # separating the values with the comma "," character - per
> ! # http://www.sequenceontology.org/gff3.shtml. Do it this way:
> ! #push @result,join '=',$self->escape($t),join(',', map
> {$self->escape($_)} @values);
> }
> my $id = $self->primary_id;
> my $name = $self->display_name;
> ! unshift @result,"ID=".$self->escape($id) if
> defined $id;
> ! unshift @result,"Parent=".$self->escape($parent->primary_id) if
> defined $parent;
> ! unshift @result,"Name=".$self->escape($name) if
> defined $name;
> return join ';', at result;
> }
>
>
>
>
--
Lincoln D. Stein
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724
(516) 367-8380 (voice)
(516) 367-8389 (fax)
FOR URGENT MESSAGES & SCHEDULING,
PLEASE CONTACT MY ASSISTANT,
SANDRA MICHELSEN, AT michelse at cshl.edu
More information about the Bioperl-l
mailing list