[Bioperl-l] Re: problems with Bio::Tools::GFF

Scott Cain cain at cshl.org
Mon Nov 3 14:33:52 EST 2003


On Mon, 2003-11-03 at 14:13, Jason Stajich wrote:
> Feel free to fix it to spec Scott.

Will do--I mentioned it because I am always concerned that I am
misinterpreting the spec; if I codify my misinterpretations, that would
kind of shoot the idea of standard out the window.
> 
> Note that I have also made no attempt to parse/write the Gap or Alignment
> stuff in any sort of special way - I basically made it so it supports what
> GFF2 currently looks like only in GFF3 flavor.  Perhaps it makes sense to
> do all of that work on Chris's Unflattner though rather than in
> Tools::GFF.  A SeqFeature::Tools::Flattner is probably in order as well to
> turn HSPs and other paired sequences into GFF3 Alignments.

I'm not sure it's necessary to move to Unflattener.  Since the format is
fairly simple, it is only really necessary to split the information in
the groups column to tag value pairs and let the user decide what to do
with the information.  The only thing that I am somewhat at a loss to
deal with is cigar line info, but I don't think that is being parse by
Bio::DB::GFF yet either.
> 
> As for the seq stuff - will likely need a Bio::SeqIO::gff3 for that.
> 
Ouch--I was afraid you were going to suggest that.  I suppose if we make
it a read-only module, I guess that should be ok.  The thought of making
it write makes my head hurt.

> Anyone is welcome to add these changes - I don't think I'll be able to
> make many contributions until December so it would be best if someone else
> took it on.
> 
> -jason
> 
> On Mon, 3 Nov 2003, Scott Cain wrote:
> 
> > Hi Jason and Lincoln,
> >
> > I have a few concerns with Bio::Tools::GFF. The first is with the method
> > _from_gff3_string, which does a split on \t to separate columns.  I
> > think the GFF3 spec says it can be space delimited, so that should
> > probably be \s+.  Additionally, to split the groups column, it uses
> > \s*;\s*, but I think that spaces have to be escaped, therefore, it
> > should only split on ; and spaces would indicate a problem (especially
> > if one splits on spaces as indicated above).
> >
> > Finally, it doesn't provide a method of accessing the sequence that is
> > optionally at the bottom of the file.  I am not exactly sure how to
> > implement that (or I would), but I suspect it will have to be handled in
> > the next_feature method.  Of course, the problem with handling it there
> > is that it is not a feature.
> >
> > Scott
> >
> >
> 
> --
> Jason Stajich
> Duke University
> jason at cgt.mc.duke.edu
-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                         cain at cshl.org
GMOD Coordinator (http://www.gmod.org/)                     216-392-3087
Cold Spring Harbor Laboratory



More information about the Bioperl-l mailing list