[Bioperl-l] Bio::FeatureHolderI
Matthew Pocock
matthew_pocock@yahoo.co.uk
Mon, 18 Nov 2002 23:55:59 +0000
Ewan Birney wrote:
>
> Agreed (on both points). We should definitely do this post 1.2...
>
I strongly agree with this. As the one who is writing the biojava query
optimizer, I caution against adding queries to 1.2 - you will need a
minimum of 6 months to get it even aproximately right if you intend to
do query planning. There are subtle issues with queries - are you
matching top-level features only, or are you traversing the feature
tree? Will you allow user-defined filters? Are you returning a set of
interesting features, or are you returning cut-down feature hierachies?
Do you want to allow an entire xpath-style tree searching language, or
stick to simply filtering features by their own properties? Do you want
to be able to filter one feature by the properties of another (e.g.
repeat that overlaps an exon) or is this out-of-bounds data (e.g. make
location of exon positions, filter by and(repeat, overlaps location)?
Are these things intended for people or for computers? Do you have any
intention of passing these over the wire? Can they be applied to any
'FeatureHolderI' or just to sequences, or just to features, or perhaps
to entire sequence DBs? Do feature holders know anything about the
filters that would hit their sequences? Will this play well with some
extended querying capabilities e.g. could something functionaly
equivalent to
seqDB.filterSequences(byHasGoTerm(foo)).filterFeatures(exons) be
expresssed as a single query object (graph, hierachy, what ever)? With
queries, meta-data is everything. With out it, you end up with
unmaintainable spageti.
Above all, do you realy want to do this all yourself, from scratch, for
every data-type people may be interested in, or do you want to off-load
this to the new ontology stuff and get someone clever like ChrisM to
write code to generate the query framework from a propper ontological
deffinition of the bioperl objects? Sounds scarey, but believe me, you
will not want to maintain query code for everything people want to query.
So, above all, I would sudgest comming up with a page full of queries
you would like to express (like find exons) and things you think could
be optimized (like we're in ensembl, just scan the exon table). Start
thinking /seriously/ about your meta-data, and formalize this so that
you have an object model for representing meta-data. Read good books on
lambda-calculus, prolog and the dragon compiler book. Learn something
like Hascal, lisp and prolog. Learn SQL & AQL. Decide if the queries are
based arround text (like sql) or are syntax-trees (like in xquery).
Then, and only then, start to code this up.
Or, add a feature method to FeatureHolderI tonight with a FilterI with
accept(feature) and then learn all this after. That's what we did.
One other thing, what ever query code you end up with, it's likely to
have a high bus quotient (no. of people using the code) / (no. of people
who could be hit by a bus and the code still get maintained). I don't
realy see how that's avoidable. This stuff falls into the same category
as writing DP code generators & fast matrix math libraries.
Welcome to semantic hell.
Matthew
ps did I mention that you need meta-data?
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@bioperl.org
> http://bioperl.org/mailman/listinfo/bioperl-l
>
--
BioJava Consulting LTD - Support and training for BioJava
http://www.biojava.co.uk
__________________________________________________
Do You Yahoo!?
Everything you'll ever need on one web page
from News and Sport to Email and Music Charts
http://uk.my.yahoo.com