[Bioperl-l] Re: [SO-devel] GFF3 - Bioperl - SO

Allen Day allenday at ucla.edu
Wed Oct 13 13:59:30 EDT 2004


We should keep this onlist, others may be interested as well.

On Tue, 12 Oct 2004, Steffen Grossmann wrote:

> Dear Allen,
> 
> I just had a look at your module and I think its a good start. I
> immediately have a bunch of ideas how to extend it to get where I think
> one should get to. So, I will accept your offer to work on the module
> and apply for a bioperl developer's account.
> 
> So here are the first proposals:
> 1) Very easy: The 'official' GFF3 specification (you know where, don't 
> you?) states that after the ##FASTA directive there are no more 

Yes, I've written bits of it.

> annotations to follow. So, although the ##FASTA directive is not yet 
> implemented, you should make sure that the rest of the file is not 
> parsed. At the moment you get back a nonsense-feature for every line 
> after the ##FASTA line.

Good.  Please add.

> 2) Actually, it would be nice to be able to retrieve hierarchically 
> nested collections of features from a GFF-file, where the hierarchy 
> comes from the 'Parent' tag. The concept of parsing a GFF-file 
> line-by-line, is somehow not compatible with this, because it naturally 
> only can produce flat arrays of SeqFeatures. Possible workarounds are to 
> provide some 'unflattening'-mechanism (but where should it naturally 
> go?), or methods which directly retrieve an array holding the nested 
> SeqFeatures (which would be an extension to the standard 'next_feature' 
> approach). I strongly prefer the last option.

You might want to take advantage of the ### directive here.  Parse
everything up to it and cache, then start returning hierarchical features
from the cache.  When the cache empties and the filehandle is still
returning lines, fill the cache again.  Rinse, repeat.

> 3) Instead of requiring exact compatibility with SOFA, one could also 
> simply complain about non SOFA-compatible terms. Additionally, if one 

No, this violates the spec.  If you want to do this you can give a
##Ontology directive to describe where the new terms came from.  Feature
type terms need to be SOFA extensions.

> would have a mechanism to map non-SOFA terms to SOFA terms, the module 
> could be used to create SOFA compatible versions of existing GFF files 
> (which would be a great tool, I think!).

You can do this via a callback mechanism to allow custom typemappings.  
Good idea.

-Allen

> These are some thoughts I have. I am not sure whether a discussion about 
> the future development of the module, should be conducted within the 
> Bioperl-l list, or whether we should do it privately and then only post 
> our proposals once we agree...
> 
> Greetings!
> 
> Steffen
> 
> Allen Day wrote:
> 
> >Look at Bio::FeatureIO::gff in bioperl-live.  It currently supports
> >lookup/validation of ontology terms via Bio::Ontology::OntologyStore, but
> >doesn't do and cardinality or type/relation enforcement which you seem to
> >be alluding to below.
> >
> >I'd be very pleased if you want to work on this too.  Or anyone else on
> >these lists, for that matter :-).
> >
> >-Allen
> >
> >
> >On Mon, 11 Oct 2004, Steffen Grossmann wrote:
> >
> >  
> >
> >>Dear all,
> >>
> >>I fancy very much the approach taken by SO(FA) 
> >>(http://song.sourceforge.net/) to standardize the vocabulary used for 
> >>sequence annotation. Also, the GFF3 format is a nice way to represent 
> >>SO-compatible annotations and it would be a great thing to have this all 
> >>working seamlessly with bioperl.
> >>
> >>A first step towards such a seamless integration into bioperl would be a 
> >>parser which is able to read/write hierarchically nested feature 
> >>collections from/to GFF3 files. Such a parser should make use of the 
> >>GFF3 specific 'ID' and 'Parent' tags.
> >>
> >>Of course, I know about the 'Bio::Tools::GFF' and 
> >>'Bio::SeqFeature::Tools' modules, where some related stuff can be found. 
> >>The problem is that the 'Bio::Tools::GFF' module doesn't respect the 
> >>'Parent' and 'ID' tag structure and grouping in the 'Unflattener' 
> >>approach is also done conceptually different.
> >>
> >>Does anybody know about whether there is someone working on such a 
> >>project? Or, if there is no such project, is someone interested in 
> >>joining to start it?
> >>
> >>Thanks in advance for any response!
> >>
> >>Steffen
> >>
> >>
> >>    
> >>
> >_______________________________________________
> >Bioperl-l mailing list
> >Bioperl-l at portal.open-bio.org
> >http://portal.open-bio.org/mailman/listinfo/bioperl-l
> >
> >  
> >
> 
> 
> 


More information about the Bioperl-l mailing list