[Bioperl-l] Re: [SO-devel] GFF3 - Bioperl - SO
Allen Day
allenday at ucla.edu
Thu Oct 14 13:05:09 EDT 2004
On Thu, 14 Oct 2004, Steffen Grossmann wrote:
> Dear Allen,
>
> I meanwhile understood that Bio::Tools::GFF in connection with
> Bio::SeqFeature::Tools::IDHandler is doing a lot of the stuff I'd like
> to have. Somehow, Bio::FeatureIO::gff seems to be a parallel development
> to the first alternative. I don't know in how far there are plans to
I don't see from a quick look at the source how Bio::Tools::GFF is related
to Bio::SeqFeature::Tools::IDHandler. There's nothing preventing use of
the IDHandler in Bio::FeatureIO::gff, in fact it sounds like what you've
proposed to add so part of your work is already done.
> bring the two approaches together, but at the moment that seems to be
> more complicated than just bringing one approach to an acceptable state.
> Since the Bio::Tools::GFF approach seems to be ahead, I will focus on it
> at the moment.
I would advise against adding more features into Bio::Tools::GFF. I can't
speak for all others, but my future development will not use it, and I'm
in the process of converting code which does use it to depend on
Bio::FeatureIO::gff.
-Allen
>
> Nevertheless, I add a small patch which fixes the problem with making
> features from the lines after the ##FASTA directive.
>
> --- patch for Bio::FeatureIO::gff.pm starts here
> 251a252,255
> > while( my $gff_string = $self->_readline() ) {
> > # we just consume the rest of the file...
> > }
> >
> --- patch ends here
>
> Maybe you can add it, because I am not a bioperl developer yet...
>
> Steffen
>
> Allen Day wrote:
>
> >We should keep this onlist, others may be interested as well.
> >
> >On Tue, 12 Oct 2004, Steffen Grossmann wrote:
> >
> >
> >
> >>Dear Allen,
> >>
> >>I just had a look at your module and I think its a good start. I
> >>immediately have a bunch of ideas how to extend it to get where I think
> >>one should get to. So, I will accept your offer to work on the module
> >>and apply for a bioperl developer's account.
> >>
> >>So here are the first proposals:
> >>1) Very easy: The 'official' GFF3 specification (you know where, don't
> >>you?) states that after the ##FASTA directive there are no more
> >>
> >>
> >
> >Yes, I've written bits of it.
> >
> >
> >
> >>annotations to follow. So, although the ##FASTA directive is not yet
> >>implemented, you should make sure that the rest of the file is not
> >>parsed. At the moment you get back a nonsense-feature for every line
> >>after the ##FASTA line.
> >>
> >>
> >
> >Good. Please add.
> >
> >
> >
> >>2) Actually, it would be nice to be able to retrieve hierarchically
> >>nested collections of features from a GFF-file, where the hierarchy
> >>comes from the 'Parent' tag. The concept of parsing a GFF-file
> >>line-by-line, is somehow not compatible with this, because it naturally
> >>only can produce flat arrays of SeqFeatures. Possible workarounds are to
> >>provide some 'unflattening'-mechanism (but where should it naturally
> >>go?), or methods which directly retrieve an array holding the nested
> >>SeqFeatures (which would be an extension to the standard 'next_feature'
> >>approach). I strongly prefer the last option.
> >>
> >>
> >
> >You might want to take advantage of the ### directive here. Parse
> >everything up to it and cache, then start returning hierarchical features
> >from the cache. When the cache empties and the filehandle is still
> >returning lines, fill the cache again. Rinse, repeat.
> >
> >
> >
> >>3) Instead of requiring exact compatibility with SOFA, one could also
> >>simply complain about non SOFA-compatible terms. Additionally, if one
> >>
> >>
> >
> >No, this violates the spec. If you want to do this you can give a
> >##Ontology directive to describe where the new terms came from. Feature
> >type terms need to be SOFA extensions.
> >
> >
> >
> >>would have a mechanism to map non-SOFA terms to SOFA terms, the module
> >>could be used to create SOFA compatible versions of existing GFF files
> >>(which would be a great tool, I think!).
> >>
> >>
> >
> >You can do this via a callback mechanism to allow custom typemappings.
> >Good idea.
> >
> >-Allen
> >
> >
> >
> >>These are some thoughts I have. I am not sure whether a discussion about
> >>the future development of the module, should be conducted within the
> >>Bioperl-l list, or whether we should do it privately and then only post
> >>our proposals once we agree...
> >>
> >>Greetings!
> >>
> >>Steffen
> >>
> >>Allen Day wrote:
> >>
> >>
> >>
> >>>Look at Bio::FeatureIO::gff in bioperl-live. It currently supports
> >>>lookup/validation of ontology terms via Bio::Ontology::OntologyStore, but
> >>>doesn't do and cardinality or type/relation enforcement which you seem to
> >>>be alluding to below.
> >>>
> >>>I'd be very pleased if you want to work on this too. Or anyone else on
> >>>these lists, for that matter :-).
> >>>
> >>>-Allen
> >>>
> >>>
> >>>On Mon, 11 Oct 2004, Steffen Grossmann wrote:
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>>Dear all,
> >>>>
> >>>>I fancy very much the approach taken by SO(FA)
> >>>>(http://song.sourceforge.net/) to standardize the vocabulary used for
> >>>>sequence annotation. Also, the GFF3 format is a nice way to represent
> >>>>SO-compatible annotations and it would be a great thing to have this all
> >>>>working seamlessly with bioperl.
> >>>>
> >>>>A first step towards such a seamless integration into bioperl would be a
> >>>>parser which is able to read/write hierarchically nested feature
> >>>>collections from/to GFF3 files. Such a parser should make use of the
> >>>>GFF3 specific 'ID' and 'Parent' tags.
> >>>>
> >>>>Of course, I know about the 'Bio::Tools::GFF' and
> >>>>'Bio::SeqFeature::Tools' modules, where some related stuff can be found.
> >>>>The problem is that the 'Bio::Tools::GFF' module doesn't respect the
> >>>>'Parent' and 'ID' tag structure and grouping in the 'Unflattener'
> >>>>approach is also done conceptually different.
> >>>>
> >>>>Does anybody know about whether there is someone working on such a
> >>>>project? Or, if there is no such project, is someone interested in
> >>>>joining to start it?
> >>>>
> >>>>Thanks in advance for any response!
> >>>>
> >>>>Steffen
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>_______________________________________________
> >>>Bioperl-l mailing list
> >>>Bioperl-l at portal.open-bio.org
> >>>http://portal.open-bio.org/mailman/listinfo/bioperl-l
> >>>
> >>>
> >>>
> >>>
> >>>
> >>
> >>
> >>
> >_______________________________________________
> >Bioperl-l mailing list
> >Bioperl-l at portal.open-bio.org
> >http://portal.open-bio.org/mailman/listinfo/bioperl-l
> >
> >
> >
>
>
>
More information about the Bioperl-l
mailing list