[Open-bio-l] seqfeatue_location

Hilmar Lapp hlapp@gnf.org
Tue, 30 Apr 2002 13:22:55 -0700


Even though the schema is very generic, it kind-of burdens the application layer a lot for seqfeature_locations. Ideally, you would want to obtain one location (and there can already be many for one feature) in one row, in one hit of the db. If it's a simple location that's not a problem; if it's a fuzzy loc it becomes kind of ugly: you need to know exactly the ontology, and condense 4 rows into 1 object.

I'd suggest to have 3 attributes instead of 1 for start and end in seqfeature_location: start_pos, start_range, start_type, and the same for end. <start|end>_type are FKs to ontology_term. <start|end>_range give the range for BETWEEN type locations. Otherwise <start|end>_pos would be the known end of the range, and the position if range == 0 (simple loc). I.e., for MIN_START start_pos would be min_start (max_start unknown), for MAX_START start_pos would be max_start (min_start unknown), and so forth.

Admittedly not as generic as the current schema, but much less gymnastics required on the application end (and likely to be significantly better performing, as those tables are supposedly high-volume); and I strongly hope that fuzzy locs won't become even weirder than they are right now.

Can anyone warm up to that?

If not, can anyone warm me up to something else?

	-hilmar
-- 
-------------------------------------------------------------
Hilmar Lapp                            email: lapp@gnf.org
GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
-------------------------------------------------------------