[Open-bio-l] What is the clear distinction between a feature and a bioentry

Hilmar Lapp hlapp@gnf.org
Wed, 22 May 2002 10:56:17 -0700


In the process of using Biosql for real life I seem to be hitting the wall again. I've got assembly contigs with chromosome mappings in (and a few other base tables populated), but that's not the real challenge I suppose.

Now for transcripts: they reference a gene, genes may be shared between transcripts (alternative transcripts), and they map with split-locations to contigs (alternatively, you may say they reference a bag of exons map with simple locations to contigs). Transcripts also come with a native accession and identifier from the external databank, and so do genes. You need to be able to efficiently query for those accesssions.

Now what is supposed to go where:
1) Are the genes this resource called supposed to be bioentries, or features on contigs? (I'd put them as bioentries.)
2) Are you supposed to put transcripts as features of the gene, with remote location on the contig? Or as bioentries, with a number of exon features created to have remote locations on the contig it maps to?
3) If any of these shall go as features, does that mean you suggest putting accession and identifier as qualifier/value associations? (I have serious doubts that this can scale for query performance.)

In general what is the rule for something to end up as a feature, or as a bioentry? My feeling is that there really is no rule, which is really bad (because it creates arbitrariness, which essentially counteracts interoperability). 

Bioentries map to Bioentries, Features to Bioentries, Features to Features, and Bioentries to Features. All with a location, some with a score, some with additional significance and identity. Contigs map to chromosomes. (Are chromosomes bioentries? Based on an earlier thread, I created a separate table, which led to yet another location map table. Ugly. Do you treat chromosomes as bioentries then? Ugly, too.) Is everything a feature? Maybe you can feel my pain.

Thomas, you said you dumped Ensembl into Biosql. How did you do that? How did you map ensembl transcripts and genes with stable IDs to Biosql? How did you map evidence as evidence in Biosql?

Am I right with the impression that there is no Biosql server running yet anywhere that would demonstrate POC both in terms of how you import annotated genomes into this schema, and how this then scales? Our goal here is to push it to at least POC level within the next say 3-4 weeks.

	-hilmar

I apologize if you receive this twice, but our exchange server has been really shaky in delivering list-addressed emails, so I copied in some people.

-- 
-------------------------------------------------------------
Hilmar Lapp                            email: lapp at gnf.org
GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
-------------------------------------------------------------