[BioSQL-l] seqfeature.display_id

Hilmar Lapp hlapp@gnf.org
Wed, 25 Sep 2002 14:03:15 -0700


On Wednesday, September 25, 2002, at 01:08 PM, Thomas Down wrote:

> On Wed, Sep 25, 2002 at 12:49:44PM -0700, Hilmar Lapp wrote:
>> In bioperl SeqFeatures have a display_id, and I'd like to serialize
>> this. Would anyone have a problem with me adding display_id to
>> seqfeature as a nullable attribute?
>>
>> does this attribute exist in other Bio* (BioJava?) as well?
>
> No, not BioJava -- where IDs exist, they tend to go in an
> Annotation property.  We've thought at various times about
> adding this to the interfaces, but tend to be scared off
> a little by all the usual scoping and objects-with-multiple-names
> issues.  I guess saying that it's a `display' ID is a reasonable
> way of punting these.  We're keen so solve this property in
> BioJava2 -- where some of the issues should be simpler, since
> it will be the FeatureCards which have names, rather than
> any of their Mappings.
>
> I've got no objection to you adding this property -- although
> maybe it would be better to have a many-many association between
> seqfeature and seqfeature_name, as we discussed when you were
> talking about gene_name (I'm still a strong believer in the
> Genes Aren't Special principle).
>

I'd like to postpone some relationships which may be better captured 
as n:n to a later time if possible ... Generally, these can all go 
into qualifier/value associations, but that's got some indexing and 
performance downsides which deserve better investigation before 
actually going that route. (I come from the bioperl view of 
problems: make it simple now, and become more complicated once 
simple doesn't work anymore)

>       Thomas.
>
>
> PS. Any idea how many more schema modifications you're going to
>     make in the current phase of work?  I'm thinking about doing
>     a branch of the BioJava adaptors.

There may be some more pending, but I think I already posted or made 
the biggest ones. Generally speaking, once we get our project here 
working there will be a huge amount of diverse data funneled into 
biosql. Since there is no precedent for this, it is hard to predict 
whether and which road blocks the schema will pose that necessitate 
changes to make it work. Similarly, query performance may or may not 
mandate changes to optimize speed.

Then there's things which aren't there yet at all, but which we will 
need for the 'synthesized content' part of our SymGene system. This 
is genome mapping and bioentry-to-bioentry mapping.

We're under significant pressure to get this working, so time-wise 
we're not looking at some point in a couple of months, but rather 
within the next few weeks. I don't know how this aligns with your 
plans for biojava.

	-hilmar

--
-------------------------------------------------------------
Hilmar Lapp                            email: lapp at gnf.org
GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
-------------------------------------------------------------