[Bioperl-l] Handling discontiguous feature locations in Bio::DB::SeqFeature::Store -- proposed patch to Bio::Graphics::FeatureBase
Cook, Malcolm
MEC at stowers-institute.org
Wed May 2 22:38:31 UTC 2007
Lincoln,
Here for your comment and review is a very reworked version of
Bio::Graphics::FeatureBase->gff3_string.
The main difference is to that homogenous children get ALL their
attributes except for start/stop from the parent, including the group.
I also provide option as to whether or now to "remove extraneous level
of parentage" called $preserveHomegenousParent.
There is an in-line comment and question for you in the code body.
It works well in my hands to my use cases, but, I'm not positive it is
in the spirit of your intentions.
Cheers,
Malcolm
sub gff3_string {
my ($self, $recurse, $preserveHomegenousParent,
# Note: the following parameters, whose name begins with '$_',
# are intended for recursive call only.
$_parent,
$_self_is_hsf, # is $self the child in a homogeneous parent/child
relationship?
$_hsf_parentgroup, # if so, what is the group (GFF column 9) of
the parent
) = @_;
# PURPOSE: Return GFF3 format for the feature $self. Optionally
# $recurse to include GFF for any subfeatures of the feature. If
# recursing, provide special handling to "remove an extraneous level
# of parentage" (unless $preserveHomegenousParent) for features
# which have subfeatures all of whose types are the same as the
# feature itself (the "homogenous parent/child" case). This usage is
# a convention for representing discontiguous features; they may be
# created by using the -segment directive without specifying a
# distinct -subtype in to `new` when creating a
# Bio::Graphics::FeatureBase (i.e. Bio::DB::SeqFeature,
# Bio::Graphics::Feature). Such homogenous subfeatures created in
# this fashion DO NOT have the parent (GFF column 9) attributes
# propogated to them; so, since they are all part of the same
# parent, the ONLY difference relevant to GFF production SHOULD be
# the $start and $end coordinates for their segment, and ALL THIER
# OTHER ATTRIBUTES should be copied down from the parent (including:
# strand, score, Name, ID, Parent, etc).
my $hparentORself = $_self_is_hsf ? $_parent : $self; # $self's
parent, if it is a homogenous child, otherwise $self.
if ($recurse && (my @ssf = $self->sub_SeqFeature)) {
my $homogenous = ! grep {$_->type ne $self->type} @ssf; # will be
TRUE only if all subfeatures are the same type as $self.
my $mygroup =
# compute $self's group if it is needed to be passed down to
# subfeatures, unless it is already being passed down (in which
# case there are (at least) 3 levels of homogenous parent child
# (will this ever happen in practice???))
! $homogenous ? '' : $_self_is_hsf ? $_hsf_parentgroup :
$self->format_attributes($_parent);
return (join("\n", (($preserveHomegenousParent ?
($self->gff3_string(0)) : ()) , map
{$_->gff3_string($recurse,$preserveHomegenousParent,$hparentORself,$homo
genous,$mygroup)} @ssf)));
} else {
my $name = $hparentORself->name;
my $class = $hparentORself->class;
my $group = $_self_is_hsf ? $_hsf_parentgroup :
$self->format_attributes($_parent);
my $strand = ('-','.','+')[$self->strand+1];
# TODO: understand conditions under which this could be other than
# hparentORself->strand. In particular, why does add_segment flip
# the strand when start > stop? I thought this was not allowed!
# Lincoln - any ideas?
my $p = join("\t",
$hparentORself->ref||'.',$hparentORself->source||'.',$hparentORself->met
hod||'.',
$self->start||'.',$self->stop||'.',
defined($hparentORself->score) ? $hparentORself->score : '.',
$strand||'.',
defined($hparentORself->phase) ? $hparentORself->phase : '.',
$group||'');
}
}
________________________________
From: Cook, Malcolm
Sent: Friday, April 27, 2007 1:45 PM
To: 'lincoln.stein at gmail.com'
Cc: 'lstein at cshl.org'; 'bioperl list'
Subject: RE: Handling discontiguous feature locations in
Bio::DB::SeqFeature::Store -- proposed patch to
Bio::Graphics::FeatureBase
Hi Lincoln,
Cool.
The principal of what I figured out I still think holds but the
implementation is slightly broke. Improved patch forthoming next week.
Malcolm Cook
Database Applications Manager - Bioinformatics
Stowers Institute for Medical Research - Kansas City, Missouri
________________________________
From: lincoln.stein at gmail.com
[mailto:lincoln.stein at gmail.com] On Behalf Of Lincoln Stein
Sent: Friday, April 27, 2007 12:45 PM
To: Cook, Malcolm
Cc: lstein at cshl.org; bioperl list
Subject: Re: Handling discontiguous feature locations in
Bio::DB::SeqFeature::Store -- proposed patch to
Bio::Graphics::FeatureBase
Hi Malcom,
This is absolutely ok and you can go ahead and commit.
Thanks for figuring this out!
Lincoln
On 4/26/07, Cook, Malcolm < MEC at stowers-institute.org
<mailto:MEC at stowers-institute.org> > wrote:
Lincoln, et al,
I find that the gff3_string for
Bio::DB::SeqFeature objects retreived
from a Bio::DB::SeqFeature::Store that were
initially created with
-seqments (i.e. whose location was
discontiguous) does not display any
other attributes in column 9 than "Name".
What do you think of the following patch to
Bio::Graphics::FeatureBase,
whose effect is to "contrive to return
(duplicated) common group values"
(which otherwise get lost when "collapsing"
"homogenous" parent/child
features)
Another approach would be to copy the attributes
from the parent to the
children when the -seqments are first created.
Another approach would be to use
Bio::SeqFeature::Generic as the db's
-seqfeature_class and save with -location being
a Bio::Location::Split,
but this was wrougth with other problems.
Any other suggestions? Do you want me to commit
this patch?
Cheers,
Malcolm
Patch follows:
Index: FeatureBase.pm
===================================================================
RCS file:
/home/repository/bioperl/bioperl-live/Bio/Graphics/FeatureBase.pm,v
retrieving revision 1.29
diff -c -r1.29 FeatureBase.pm
*** FeatureBase.pm 16 Apr 2007 19:55:33
-0000 1.29
--- FeatureBase.pm 26 Apr 2007 16:30:23
-0000
***************
*** 581,587 ****
foreach (@children) {
s/Parent=/ID=/g;
} # replace Parent tag with ID
! return join "\n", at children;
}
return join("\n",$p, at children);
--- 581,589 ----
foreach (@children) {
s/Parent=/ID=/g;
} # replace Parent tag with ID
! #return join "\n", at children;
! # Instead of above, additionally, contrive
to return (duplicated)
common group values
! return(join("$group\n", at children) .
$group);
}
return join("\n",$p, at children);
--
Lincoln D. Stein
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724
(516) 367-8380 (voice)
(516) 367-8389 (fax)
FOR URGENT MESSAGES & SCHEDULING,
PLEASE CONTACT MY ASSISTANT,
SANDRA MICHELSEN, AT michelse at cshl.edu
More information about the Bioperl-l
mailing list