[Bioperl-l] bioperl-db revival

Jason Stajich jason@chg.mc.duke.edu
Tue, 27 Mar 2001 10:18:59 -0500 (EST)


On Tue, 27 Mar 2001, Ewan Birney wrote:

>
>
> I have decided to revive bioperl-db for "simple" sequence storage.
>
> [note to Hilmar/Jason - this is definitely the right place to put this now
> I have written some of the SQL/code. It is completely bioperl dependent at
> the moment. Our discussion about wider ranging infrastructure is
> definitely worth keeping alive but this project is so dependent on bioperl
> there's not much point in pretending otherwise]
>
>
> The aim is a pretty vanilla set of SQL tables, quite normalised, nothing
> fancy. The only sticking point are the dreaded feature locations, which,
> if we go the full fuzzy modelling (aaaaarh not the fuzzies again) I
> will need help (jason - can you face the fuzzies again?).
>
yeah, yeah I knew once I got in fuzzy world I'd be stuck.  It's not so
bad now that we have a sensible object model to handle them.  Send your
queries my way when you stuck.
>
> At the moment I am going to have a cheeky "unparsed_location" string table
> for the fuzzies.
>
I want to warn that the fuzzy parsing is NOT 100% it will fail on some
horribly formatted strings.  I have not attempted to undertake parsing of
complete EMBL as they are doing on biojava side so we may have a couple of
odd cases that fail.  That being said, it has worked on all the test cases
we have included (t/testfuzzy.genbank).

 >
> To bind to the SQL tables I will have a series of pretty vanilla
> "adaptors" talking Ensembl or FlyBase, equivalent I believe to Java Bean
> singleton objects talking java (? java experts to correct me) which
> mediate persistence of Bio::Seq objects.
>
> The only wrinkle here is how to deal/detect nicely one-instance style
> objects, such as Species (homo sapiens). I will I hope glean some wisdom
> from Arne Stabenau on this tomorrow but if anyone has some helpful ideas,
> just put them down here.
>
>
> For output, it will definitely support appropiate Bio::DB:: *I interfaces
> and probably extend it for simple field orientated queries that can be
> easily mapped to SQL (nothing fancy)
>
>
> For input it will have a loader script that can load a SeqIO stream into
> the database.
>
nice.   Some of my existing code in bioperl-db started on this, but I
didn't get very far straddling the singleton sequences vs assembling them
ala ensembl.  Your approach of serving up seqs as singletons (unrelated
seqs) here should be very doable.

>
> It will be able to support mutliple logical "databases" (eg, swissprot and
> genbank) in the same db instance and handle multiple versions of
> sequences. It will not currently handle true entry-granualarity
> mutability, but if anyone has any must-do-now aspects please shout.
>
If we can add at a later time that would be nice, but let's see how this
part works.
>
> I want to put code in the Bio::DBSQL:: namespace. Is this ok?
>
sure.  Are you thinking of implementing any of the Bio::DB::SeqI
interface?  If you learn anything when doing this that will necessitate
improvements to the interfaces please do note that.

>
> Code will go into the bioperl-db cvs module
>
>
> I have a tight-ish deadline on this, so please shout if you feel you want
> to add anything.
>
>
>
>
>
> ewan
>
>
>
>
> -----------------------------------------------------------------
> Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420
> <birney@ebi.ac.uk>.
> -----------------------------------------------------------------
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@bioperl.org
> http://bioperl.org/mailman/listinfo/bioperl-l
>

Jason Stajich
jason@chg.mc.duke.edu
Center for Human Genetics
Duke University Medical Center
http://www.chg.duke.edu/