[Bioperl-l] Bio::DB::SeqFeature to GFF mishandles attributes with multiple values
Cook, Malcolm
MEC at stowers-institute.org
Fri Feb 23 18:46:00 UTC 2007
Lincoln,
OK. I'll do that...
...let's see, a quick squiz at Bio/DB/SeqFeature/Store/ ....
...ok - parse_attributes _looks_ right to me
...so, let's try it
#load a feature into a new database:
bp_seqfeature_load.PLS -dsn 'dbi:mysql:database=test;host=mysql-dev'
-create -user test -pass test <(echo -e
"J\tA\tPH\t1\t2\t.\t.\t.\tfoo=bar,blat;Name=mec\n")
#It loaded ok. Now, let's print it out in GFF3:
perl -MBio::DB::SeqFeature::Store -e 'foreach
(Bio::DB::SeqFeature::Store->new(-dsn =>
"dbi:mysql:database=test;host=mysql-dev;user=test;password=test")->featu
res(-type => "PH:A")) {print $_->gff3_string . "\n"}'
J A PH 1 2 . . . Name=mec;ID=1;foo=bar,blat
#output looks good to me
Note, I tried loading attributes foo=bar;foo=blat and it came back
foo=bar,blat. So, you can load either way.
I'll commit later today.
--Malcolm
________________________________
From: lincoln.stein at gmail.com [mailto:lincoln.stein at gmail.com]
On Behalf Of Lincoln Stein
Sent: Friday, February 23, 2007 11:16 AM
To: Cook, Malcolm
Cc: bioperl list; lstein at cshl.org
Subject: Re: Bio::DB::SeqFeature to GFF mishandles attributes
with multiple values
Hi Malcom,
You're quite right, and I appreciate your work in tracking down
and fixing it. Before you commit the patch, can you confirm that the
loader is working correctly so that comma-separated values are read back
into the data structure as multiple attributes?
Lincoln
On 2/23/07, Cook, Malcolm <MEC at stowers-institute.org> wrote:
Lincoln, and other Bio::DB::SeqFeature wanderers:
I find that generating GFF from a Bio::DB::SeqFeature
using gff3_string
does not respect the following:
"Multiple attributes of the same type are indicated by
separating the
values with the comma "," character" (c.f.
http://www.sequenceontology.org/gff3.shtml)
This one-liner demonstrates the problem:
perl -MBio::DB::SeqFeature -e 'print
Bio::DB::SeqFeature->new(-seq_id =>
"J", -start => 1, -end => 2, -primary_tag => 'PH',
-source => 'A',
-name => 'mec', -attributes => {foo => [qw(bar
blat)]})->gff3_string'
J A PH 1 2 . .
.
foo=bar;foo=blat;Name=mec
Do you agree this is a problem?
The fix is in the post-sig patch to
/Bio/DB/SeqFeature/NormalizedFeature.pm, in which I also
took the
stylistic privilege of promoting any ID, Parent, or Name
attribute to
the front of column 9, so output is now:
J A PH 1 2 . .
.
Name=mec;foo=bar,blat
Do you agree this is better?
I am poised to commit it, as well as the functionally
same patch to the
equivilent function in Bio/Graphics/FeatureBase.pm
All clear?
-- Malcolm Cook
*** NormalizedFeature.pm 2 Feb 2007 21:05:42 -0000 1.25
--- NormalizedFeature.pm 23 Feb 2007 15:37:01 -0000
***************
*** 481,494 ****
next if $t eq 'load_id';
next if $t eq 'parent_id';
foreach (@values) { s/\s+$// } # get rid of
trailing whitespace
!
! push @result,join
'=',$self->escape($t),$self->escape($_) foreach
@values;
}
my $id = $self->primary_id;
my $name = $self->display_name;
! push @result,"ID=".$self->escape($id)
if defined
$id;
! push
@result,"Parent=".$self->escape($parent->primary_id) if defined
$parent;
! push @result,"Name=".$self->escape($name)
if
defined $name;
return join ';', at result;
}
--- 481,498 ----
next if $t eq 'load_id';
next if $t eq 'parent_id';
foreach (@values) { s/\s+$// } # get rid of
trailing whitespace
!
! push @result,join
'=',$self->escape($t),$self->escape($_) foreach
@values;
! # NO! Multiple attributes of the same type are
indicated by
! # separating the values with the comma ","
character - per
! # http://www.sequenceontology.org/gff3.shtml. Do
it this way:
! #push @result,join '=',$self->escape($t),join(',',
map
{$self->escape($_)} @values);
}
my $id = $self->primary_id;
my $name = $self->display_name;
! unshift @result,"ID=".$self->escape($id)
if
defined $id;
! unshift
@result,"Parent=".$self->escape($parent->primary_id) if
defined $parent;
! unshift @result,"Name=".$self->escape($name)
if
defined $name;
return join ';', at result;
}
--
Lincoln D. Stein
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724
(516) 367-8380 (voice)
(516) 367-8389 (fax)
FOR URGENT MESSAGES & SCHEDULING,
PLEASE CONTACT MY ASSISTANT,
SANDRA MICHELSEN, AT michelse at cshl.edu
More information about the Bioperl-l
mailing list