[BioSQL-l] seqfeature.display_id
Hilmar Lapp
hlapp@gnf.org
Wed, 25 Sep 2002 14:03:15 -0700
On Wednesday, September 25, 2002, at 01:08 PM, Thomas Down wrote:
> On Wed, Sep 25, 2002 at 12:49:44PM -0700, Hilmar Lapp wrote:
>> In bioperl SeqFeatures have a display_id, and I'd like to serialize
>> this. Would anyone have a problem with me adding display_id to
>> seqfeature as a nullable attribute?
>>
>> does this attribute exist in other Bio* (BioJava?) as well?
>
> No, not BioJava -- where IDs exist, they tend to go in an
> Annotation property. We've thought at various times about
> adding this to the interfaces, but tend to be scared off
> a little by all the usual scoping and objects-with-multiple-names
> issues. I guess saying that it's a `display' ID is a reasonable
> way of punting these. We're keen so solve this property in
> BioJava2 -- where some of the issues should be simpler, since
> it will be the FeatureCards which have names, rather than
> any of their Mappings.
>
> I've got no objection to you adding this property -- although
> maybe it would be better to have a many-many association between
> seqfeature and seqfeature_name, as we discussed when you were
> talking about gene_name (I'm still a strong believer in the
> Genes Aren't Special principle).
>
I'd like to postpone some relationships which may be better captured
as n:n to a later time if possible ... Generally, these can all go
into qualifier/value associations, but that's got some indexing and
performance downsides which deserve better investigation before
actually going that route. (I come from the bioperl view of
problems: make it simple now, and become more complicated once
simple doesn't work anymore)
> Thomas.
>
>
> PS. Any idea how many more schema modifications you're going to
> make in the current phase of work? I'm thinking about doing
> a branch of the BioJava adaptors.
There may be some more pending, but I think I already posted or made
the biggest ones. Generally speaking, once we get our project here
working there will be a huge amount of diverse data funneled into
biosql. Since there is no precedent for this, it is hard to predict
whether and which road blocks the schema will pose that necessitate
changes to make it work. Similarly, query performance may or may not
mandate changes to optimize speed.
Then there's things which aren't there yet at all, but which we will
need for the 'synthesized content' part of our SymGene system. This
is genome mapping and bioentry-to-bioentry mapping.
We're under significant pressure to get this working, so time-wise
we're not looking at some point in a couple of months, but rather
within the next few weeks. I don't know how this aligns with your
plans for biojava.
-hilmar
--
-------------------------------------------------------------
Hilmar Lapp email: lapp at gnf.org
GNF, San Diego, Ca. 92121 phone: +1-858-812-1757
-------------------------------------------------------------