[Bioperl-l] Handling discontiguous feature locations in Bio::DB::SeqFeature::Store -- proposed patch to Bio::Graphics::FeatureBase

Cook, Malcolm MEC at stowers-institute.org
Wed May 2 22:38:31 UTC 2007


Lincoln, 
 
Here for your comment and review is a very reworked version of
Bio::Graphics::FeatureBase->gff3_string.
 
The main difference is to that homogenous children get ALL their
attributes except for start/stop from the parent, including the group.
I also provide option as to whether or now to "remove extraneous level
of parentage" called $preserveHomegenousParent.
 
There is an in-line comment and question for you in the code body.
 
It works well in my hands to my use cases, but, I'm not positive it is
in the spirit of your intentions.
 
Cheers,
 
Malcolm
 
 
sub gff3_string {
  my ($self, $recurse, $preserveHomegenousParent,
 
      # Note: the following parameters, whose name begins with '$_',
      # are intended for recursive call only.
 
      $_parent,
      $_self_is_hsf,  # is $self the child in a homogeneous parent/child
relationship?
      $_hsf_parentgroup, # if so, what is the group (GFF column 9) of
the parent
     ) = @_;
 
  # PURPOSE: Return GFF3 format for the feature $self.  Optionally
  # $recurse to include GFF for any subfeatures of the feature. If
  # recursing, provide special handling to "remove an extraneous level
  # of parentage" (unless $preserveHomegenousParent) for features
  # which have subfeatures all of whose types are the same as the
  # feature itself (the "homogenous parent/child" case). This usage is
  # a convention for representing discontiguous features; they may be
  # created by using the -segment directive without specifying a
  # distinct -subtype in to `new` when creating a
  # Bio::Graphics::FeatureBase (i.e.  Bio::DB::SeqFeature,
  # Bio::Graphics::Feature).  Such homogenous subfeatures created in
  # this fashion DO NOT have the parent (GFF column 9) attributes
  # propogated to them; so, since they are all part of the same
  # parent, the ONLY difference relevant to GFF production SHOULD be
  # the $start and $end coordinates for their segment, and ALL THIER
  # OTHER ATTRIBUTES should be copied down from the parent (including:
  # strand, score, Name, ID, Parent, etc).
 

  my $hparentORself = $_self_is_hsf ? $_parent : $self; # $self's
parent, if it is a homogenous child, otherwise $self.
 
  if ($recurse &&  (my @ssf = $self->sub_SeqFeature)) {
    my $homogenous = ! grep {$_->type ne $self->type} @ssf; # will be
TRUE only if  all subfeatures are the same type as $self.
    my $mygroup =
      # compute $self's group if it is needed to be passed down to
      # subfeatures, unless it is already being passed down (in which
      # case there are (at least) 3 levels of homogenous parent child
      # (will this ever happen in practice???))
      ! $homogenous ? '' : $_self_is_hsf ? $_hsf_parentgroup :
$self->format_attributes($_parent); 
    return (join("\n", (($preserveHomegenousParent ?
($self->gff3_string(0)) : ()) , map
{$_->gff3_string($recurse,$preserveHomegenousParent,$hparentORself,$homo
genous,$mygroup)} @ssf)));
  } else {
    my $name  = $hparentORself->name;
    my $class = $hparentORself->class;
    my $group = $_self_is_hsf ? $_hsf_parentgroup :
$self->format_attributes($_parent);
    my $strand = ('-','.','+')[$self->strand+1]; 
    # TODO: understand conditions under which this could be other than
    # hparentORself->strand.  In particular, why does add_segment flip
    # the strand when start > stop?  I thought this was not allowed!
    # Lincoln - any ideas?
    my $p      = join("\t",
 
$hparentORself->ref||'.',$hparentORself->source||'.',$hparentORself->met
hod||'.',
        $self->start||'.',$self->stop||'.',
        defined($hparentORself->score) ? $hparentORself->score : '.',
        $strand||'.',
        defined($hparentORself->phase) ? $hparentORself->phase : '.',
        $group||'');
  }
}
 



________________________________

	From: Cook, Malcolm 
	Sent: Friday, April 27, 2007 1:45 PM
	To: 'lincoln.stein at gmail.com'
	Cc: 'lstein at cshl.org'; 'bioperl list'
	Subject: RE: Handling discontiguous feature locations in
Bio::DB::SeqFeature::Store -- proposed patch to
Bio::Graphics::FeatureBase
	
	
	Hi Lincoln,
	 
	Cool.
	 
	The principal of what I figured out I still think holds but the
implementation is slightly broke.  Improved patch forthoming next week.
	 

	Malcolm Cook
	Database Applications Manager - Bioinformatics
	Stowers Institute for Medical Research - Kansas City, Missouri
	  

	 


________________________________

		From: lincoln.stein at gmail.com
[mailto:lincoln.stein at gmail.com] On Behalf Of Lincoln Stein
		Sent: Friday, April 27, 2007 12:45 PM
		To: Cook, Malcolm
		Cc: lstein at cshl.org; bioperl list
		Subject: Re: Handling discontiguous feature locations in
Bio::DB::SeqFeature::Store -- proposed patch to
Bio::Graphics::FeatureBase
		
		
		Hi Malcom,
		
		This is absolutely ok and you can go ahead and commit.
Thanks for figuring this out!
		
		Lincoln
		
		
		On 4/26/07, Cook, Malcolm < MEC at stowers-institute.org
<mailto:MEC at stowers-institute.org> > wrote: 

			Lincoln, et al,
			
			I find that the gff3_string for
Bio::DB::SeqFeature objects retreived 
			from a Bio::DB::SeqFeature::Store that were
initially created with
			-seqments (i.e. whose location was
discontiguous) does not display any
			other attributes in column 9 than "Name".
			
			What do you think of the following patch to
Bio::Graphics::FeatureBase, 
			whose effect is to "contrive to return
(duplicated) common group values"
			(which otherwise get lost when "collapsing"
"homogenous" parent/child
			features)
			
			Another approach would be to copy the attributes
from the parent to the 
			children when the -seqments are first created.
			
			Another approach would be to use
Bio::SeqFeature::Generic  as the db's
			-seqfeature_class and save with -location being
a Bio::Location::Split,
			but this was wrougth with other problems. 
			
			Any other suggestions?  Do you want me to commit
this patch?
			
			Cheers,
			
			Malcolm
			
			Patch follows:
			
			
			
			
			Index: FeatureBase.pm
	
=================================================================== 
			RCS file:
	
/home/repository/bioperl/bioperl-live/Bio/Graphics/FeatureBase.pm,v
			retrieving revision 1.29
			diff -c -r1.29 FeatureBase.pm
			*** FeatureBase.pm      16 Apr 2007 19:55:33
-0000      1.29
			--- FeatureBase.pm       26 Apr 2007 16:30:23
-0000
			***************
			*** 581,587 ****
			      foreach (@children) {
			        s/Parent=/ID=/g;
			      } # replace Parent tag with ID
			!     return join "\n", at children;
			    }
			
			    return join("\n",$p, at children);
			--- 581,589 ----
			      foreach (@children) {
			        s/Parent=/ID=/g;
			      } # replace Parent tag with ID
			!     #return join "\n", at children; 
			!     # Instead of above, additionally, contrive
to return (duplicated)
			common group values
			!     return(join("$group\n", at children) .
$group);
			    }
			
			    return join("\n",$p, at children); 
			




		-- 
		Lincoln D. Stein
		Cold Spring Harbor Laboratory
		1 Bungtown Road
		Cold Spring Harbor, NY 11724
		(516) 367-8380 (voice)
		(516) 367-8389 (fax)
		FOR URGENT MESSAGES & SCHEDULING, 
		PLEASE CONTACT MY ASSISTANT, 
		SANDRA MICHELSEN, AT michelse at cshl.edu 





More information about the Bioperl-l mailing list