[Bioperl-l] Re: Bio::EnsemblLite::UpdateableDB
Ewan Birney
birney@ebi.ac.uk
Mon, 17 Jul 2000 09:30:41 +0000 (GMT)
>
> I guess this is the most correct assesment. Obviously I'd like to see
> more overlap because that means less duplicate coding, but also means we
> have to break down the goals better. I didn't really want to fork from
> Ensembl, but it seems to be addressing data from a different perspective.
> Maybe we should talk about goals of EnsemblLite again.
>
Jason -
I have suddenly realised that you might have rejected some of Ensembl
because it seems to be "contig" focused whereas most things are "just a
sequence" focused. I see single sequence things (eg, genbank) as
"clones which have one contig". This allows you to view both unfinished
and finished/single sequence/standard genbank/ sequences in the same
schema.
I suspect that we will trash out how these two projects work together when
we meet up at BOSC. I suspect this is going to need a group of us in front
of a whiteboard ;)
> I am most interested in better integrating the 'public domain'
> genome data with laboratory produced experimental data (ie 'OUR' sequences
> for BAC123X12 ). In the best of all possible worlds - would like to be
> able to:
> (ewan and I have had this discussion before, but I would like to throw it
> out there and see what the opinions are)
>
> - build a virtual contig (from 100 kb to a couple of MB ) between marker
> D2SXX and D2SXXX that consisted of data in public domain and
> experimentally produced in-house.
> - Annotations and features included and updated automagically from public
> sources.
> - Analyze this X MB of sequence, finding and identifying known and
> predicted genes (this is ensembl like stuff), match them up with
> observed and reported data, find homologies, essentially try and know
> what this sequence does because we think it might be involved in disease
> Y.
>
> This is really hard to do right now, but is also really what I think
> researchers want to do. Computers should make this easy, instead of
> clicking away at multiple genome web sites we should be able to put
> together the known information and sprinkle in our own data. Maybe this
> is what commerical services provide and I am just not in the know... =)
>
> >
> > BTW - Jason - have you handled the "how to store a SeqFeature::Generic"
> > type problem in the SQL?
>
> check out the schema sql/ensembl-lite-mysql-addon.sql (I'll have a pretty
> graphic on ensembl wikki by next week )
>
> dna_description - describe the sequence, accession number (didn't build in
> multiple accession numbers right now)
> generic_feature - a generic feature for a sequence,
> (name,strand, source, start & end positions)
> feature_detail - tag,value pairs that exist for a feature
> feature_detail_association - associate details with generic features.
>
sounds very sane. I would like to reuse this over in Ensembl sometime.