[Open-bio-l] FW: [GMOD-devel] What is the clear distinction between a feature and a bioentry

Hilmar Lapp hlapp@gnf.org
Wed, 22 May 2002 15:15:37 -0700


OK. As we need to move forward with this we had a little brainstorming session and came to the following conclusions. Shout now if you think this is going to render our implementation incompatible with Biosql and/or GMOD.

Bioentry vs. Feature: we decided that everything that
	- lives in a namespace (biodatabase), and
	- has a stable accession and/or ID, and
	- has a sequence (physically in the database or not)
shall be a Bioentry. Features shall be essentially lightweight objects.

Furthermore, I added a Chromosome table with a FK to Taxon and an association table with location and FKs to Chromosome and Bioentry for mapping bioentries to the genomes. This association table will get an additional FK to DB_Release, which itself has a FK to Biodatabase. This reflects the assembly on which the mapping was based.
So far I haven't back-ported any of these additions to the MySQL schema. Let me know if you're comfortable with me doing so.

We also made the following decisions:
1) As much as possible, Bioentries will be mapped down to chromosomes, even if the datasource only gives the coordinates to contigs. (I think this also aligns it better with Lincoln's DB:GFF view.) Contigs will be retained in the database though, in case they are needed at some time as an entry point.
2) According to the definition given above, Genes, transcripts, and proteins, will all go into Bioentry. Exons will be features (and therefore not directly mapped to chromosomes).

With this picture, to better support querying etc at some point we will need types of Biodatabases and types of Bioentries (contig, gene, transcript, ...). If anyone is aware of an existing controlled vocabulary for that, I'd be happy about any pointer. SO could solve the Bioentry types, we'll see.

	-hilmar
-- 
-------------------------------------------------------------
Hilmar Lapp                            email: lapp at gnf.org
GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
-------------------------------------------------------------



-----Original Message-----
From: Hilmar Lapp 
Sent: Wednesday, May 22, 2002 10:56 AM
To: GMOD Devel (E-mail); OBDA BioSQL (E-mail)
Cc: td2@sanger.ac.uk; elia@fugu-sg.org; Chris Mungall; Ewan Birney
(E-mail)
Subject: [GMOD-devel] What is the clear distinction between a feature
and a bioentry


In the process of using Biosql for real life I seem to be hitting the wall again. I've got assembly contigs with chromosome mappings in (and a few other base tables populated), but that's not the real challenge I suppose.

Now for transcripts: they reference a gene, genes may be shared between transcripts (alternative transcripts), and they map with split-locations to contigs (alternatively, you may say they reference a bag of exons map with simple locations to contigs). Transcripts also come with a native accession and identifier from the external databank, and so do genes. You need to be able to efficiently query for those accesssions.

Now what is supposed to go where:
1) Are the genes this resource called supposed to be bioentries, or features on contigs? (I'd put them as bioentries.)
2) Are you supposed to put transcripts as features of the gene, with remote location on the contig? Or as bioentries, with a number of exon features created to have remote locations on the contig it maps to?
3) If any of these shall go as features, does that mean you suggest putting accession and identifier as qualifier/value associations? (I have serious doubts that this can scale for query performance.)

In general what is the rule for something to end up as a feature, or as a bioentry? My feeling is that there really is no rule, which is really bad (because it creates arbitrariness, which essentially counteracts interoperability). 

Bioentries map to Bioentries, Features to Bioentries, Features to Features, and Bioentries to Features. All with a location, some with a score, some with additional significance and identity. Contigs map to chromosomes. (Are chromosomes bioentries? Based on an earlier thread, I created a separate table, which led to yet another location map table. Ugly. Do you treat chromosomes as bioentries then? Ugly, too.) Is everything a feature? Maybe you can feel my pain.

Thomas, you said you dumped Ensembl into Biosql. How did you do that? How did you map ensembl transcripts and genes with stable IDs to Biosql? How did you map evidence as evidence in Biosql?

Am I right with the impression that there is no Biosql server running yet anywhere that would demonstrate POC both in terms of how you import annotated genomes into this schema, and how this then scales? Our goal here is to push it to at least POC level within the next say 3-4 weeks.

	-hilmar

I apologize if you receive this twice, but our exchange server has been really shaky in delivering list-addressed emails, so I copied in some people.

-- 
-------------------------------------------------------------
Hilmar Lapp                            email: lapp at gnf.org
GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
-------------------------------------------------------------


_______________________________________________________________

Don't miss the 2002 Sprint PCS Application Developer's Conference
August 25-28 in Las Vegas -- http://devcon.sprintpcs.com/adp/index.cfm

_______________________________________________
Gmod-devel mailing list
Gmod-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/gmod-devel