[Biojava-l] Schema and Docs for BioSQL

Wed, 20 Feb 2002 18:16:12 -0500 (EST)

Hello Thomas,

	What did you use to generate the postgres ddl?? I haven't found
anything that works very well...

				Thanks!

					-b

-----------------------
Brian Gilman <gilmanb@genome.wi.mit.edu>
Sr. Software Engineer MIT/Whitehead Inst. Center for Genome Research
One Kendall Square, Bldg. 300 / Cambridge, MA 02139-1561 USA
phone +1 617  252 1069 / fax +1 617 252 1902

On Wed, 20 Feb 2002, Thomas Down wrote:

> On Wed, Feb 20, 2002 at 01:52:10PM -0500, Marc Colosimo wrote:
> > Hi,
> > 
> > Is there any information about using the BioSQL classes in BioJava, such
> > as the schema for the database or examples in using it? I am interest in
> > using postgre and biojava to store lots of sequence data.
> 
> BioSQL is based on bioperl-db.  There's a little bit about
> it in the document from the first (O'Reilly) hackathon meeting:
> 
>    http://www.technophage.com/open-bio-database.pdf
> 
> The BioJava code's quite new -- I've got a little tutorial
> planned, but I'm afraid (ahem) it's not written yet.
> 
> In the mean time, the code is integrated into the main
> trunk version of biojava-live (although it didn't quite
> make it into 1.2), and hopefully shouldn't be too
> problematic to use (touch wood!).
> 
> You can get schemas (MySQL and PostgreSQL) from:
> 
>    http://www.biojava.org/download/biosql/
> 
> Right now, there are actually two PostgreSQL schemas --
> one was auto-generated from the MySQL one, the other was
> hand edited by me (identified by the -thomasd suffix).
> Right now, I'd advise the hand-edited version, but this
> should go away in future once the automated conversion has
> been perfected.
> 
> If you're using PostgreSQL, note the following:
> 
>   - You need at least version 7.1 -- previous versions didn't
>     support storing large strings in normal table attributes.
> 
>   - There's a file of stored procedures (biosqlprocs.sql)
>     which you can load into the database after loading the
>     schema.  These are auto-detected by the BioJava code,
>     and can increase write performance by a significant 
>     amount (a factor of 3, using my test setup).
> 
> 
> On the BioJava side, there isn't really any API for BioSQL
> as such.  You can just do something like:
> 
>   SequenceDB seqs = new BioSQLSequenceDB(
>       "jdbc:postgresql://dbbox.mydomain.org/biosql_db",
>       "username",
>       "password",
>       "database-name",
>       true
>   );
> 
> The first three arguments are just standard JDBC-style database
> connection details.  There's a `database name' parameter because
> BioSQL allows each `physical' SQL database to contain a number of
> `logical' databases.  Perhaps namespace would be a better term
> for these (but hey, I didn't write the original schema).  The final
> argument specifies whether the namespace should be created if it
> doesn't already exist.  Note that right now, the BioJava code
> won't create the actual SQL database, or load the schema, for you.
> You'll have to do this manally using your database's normal tools.
> 
> Having connected to the database, you can write complete
> Sequence entries using the addSequence(Sequence) method.
> 
> You can retreive sequences by ID using the getSequence(String)
> method.  Objects extracted by this method retain live connections
> to the database.  Alterations to the sequence (for instance,
> using the createFeature(Feature.Template) method) are immediately
> reflected in the database (in a transactionally safe manner, if
> the database supports this -- PostgreSQL does).  So they're true
> persistant implementations of the BioJava interfaces.
> 
> The aim is to have everything work just like in-memory
> SequenceDB, Sequence, and Feature objects.  For many purposes,
> BioSQL is now pretty close to this ideal.
> 
> Basic BioSQL doesn't support hierarchical features, so theseg
> get flattened when adding a sequence to a database (and attempts
> to create new child features on a BioSQL sequence will fail).
> However, I've got an /experimental/ extension for handling
> this.   There's an extra table (seqfeature_hierarchy) in my
> schema.  Once again, this is autodetected by the client code
> and used if available.
> 
> 
> Let me know how you get on,
> 
>     Thomas.
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l@biojava.org
> http://biojava.org/mailman/listinfo/biojava-l
>