[BioSQL-l] release preparation

Richard HOLLAND hollandr at gis.a-star.edu.sg
Mon Apr 18 01:49:15 EDT 2005


I will read schema-overview.txt and see what needs changing, if
anything. Do you have a deadline for the release that I should work
towards?

I don't see why the HowTo shouldn't be included. It went on the BioJava
site at the time as that seemed the logical home for it, but it is of
course equally at home on the BioSQL site.

Richard Holland
Bioinformatics Specialist
GIS extension 8199
---------------------------------------------
This email is confidential and may be privileged. If you are not the
intended recipient, please delete it and notify us immediately. Please
do not copy or use it for any purpose, or disclose its content to any
other person. Thank you.
---------------------------------------------


> -----Original Message-----
> From: Hilmar Lapp [mailto:hlapp at gmx.net] 
> Sent: Monday, April 18, 2005 1:39 PM
> To: Richard HOLLAND
> Cc: Biosql
> Subject: Re: [BioSQL-l] release preparation
> 
> 
> First off, before going through your HowTo document, as for the 
> description of which content is supposed to go where, have 
> you read the 
> doc/schema-overview.txt in the biosql repository? Could you list the 
> questions that that document leaves open? I'd rather expand that 
> document than writing another one from scratch; I thought Aaron did a 
> pretty good job towards your request, but certainly this can improved 
> or spiked with more details or whatever you find it could do 
> better on.
> 
> Now to the HowTo. BTW is there a reason this should not be 
> included in 
> the distribution?
> 
> > /BioJava and BioSQL/Oracle HOWTO
> >
> > What you'll need
> >
> > Bio*
> >
> > You'll need the latest version of BioJava to take advantage of the 
> > full functionality of BioSQL. This can be downloaded from 
> biojava.org 
> > . You'll also need the latest Oracle BioSQL schema. Here you have a 
> > choice of two options:
> > Original : by Hilmar Lapp, the original BioSQL schema takes full 
> > advantage of Oracle's security mechanisms and produces a 
> complex but 
> > high quality schema. You'll need sysdba access to your database to 
> > install it.
> 
> I'd appreciate if this could be straightened out a bit, as you really 
> do not need sysdba access if you're not going to create 
> tablespaces and 
> users, and not doing these steps is a simple matter of commenting out 
> the respective lines.
> 
> If you are though then having access to sysdba or access to 
> someone who 
> does (i.e., pair-programming with your DBA for this task) is kind of 
> unavoidable ...
> 
> Also, the distinction of a 'complex schema' coming out of the 
> original 
> and 'simplified structure' of Len's version sounds a bit too 
> misleading 
> for me, since the schema is no different between either 
> version; there 
> is no difference in number of tables or constraints or 
> whatever (or is 
> there?).
> 
> What simplified structure might refer to is that Len's version leaves 
> out the PL/SQL packages etc? Again, just as a note, this is 
> trivial to 
> disable in BS-create-all, just comment out the respective steps.
> 
> As another note, in most Oracle environments an installer 
> will not have 
> sysdba access nor will she be supposed to create tablespaces 
> or users; 
> the DBA will do it for her. In those environments, the scriptlet that 
> does this step will serve merely as an instructional template for the 
> DBA for what to create. I.e., in usual Oracle environments 
> tablespace, 
> user, and role creation will be commented out because the DBA 
> does them 
> (has done them already).
> 
> >  Go to cvs.open-bio.org , select the biosql project, and 
> navigate to 
> > and download the entire biosql-schema/sql/biosql-ora folder.
> > Simplifed : by Len Trigg, this version is simplified in 
> structure and 
> > sits entirely inside a single user account, requiring no 
> sysdba access 
> > to install. You'll have to ask for a copy of the script from the 
> > biosql-l mailing lists.
> > Both options are fully functional and compatible with both 
> BioJava and 
> > BioPerl.
> >
> > Oracle
> >
> > Obviously, you'll need an Oracle database. For the Original schema, 
> > you'll also need sysdba access, or get your DBA to help you 
> if you do 
> > not have this yourself.
> > For the Simplified schema you just need your own login to 
> Oracle, and 
> > the permissions to create tables. You'll also need to know the 
> > tablespace name to use, ask your DBA.
> >
> > Bugfixing
> >
> > NOTE: Some of these fixes may already have been made by the 
> time you 
> > read this, so be careful and check they have not already been done!
> >
> > Original schema
> >
> > Before you do anything else, you'll need to ensure that all the 
> > scripts in the folder refer to the correct local settings 
> file. This 
> > is not always the case, so be careful. The best thing to do is a 
> > global search on all the files you downloaded, and replace all 
> > references to BS-defs with BS-defs-local .
> 
> I've done this a while ago and think there's no instances left where 
> this hasn't been changed. Please check.
> 
> >  Of course, don't do this in BS-defs.sql itself.
> >
> > Now you'll need to find the CREATE TABLE SG_Biosequence 
> statement in 
> > BS-DDL.sql . You'll notice there is a constraint there called 
> > Alphabet4 . The values in the constraint ( dna ,protein 
> etc.) are all 
> > in lower case. BioJava uses upper case values for these fields, but 
> > BioPerl uses lower case! To make it work with BioJava, 
> you'll have to 
> > modify the constraint line so that it reads like this:
> > CONSTRAINT Alphabet4
> >       CHECK (lower(Alphabet) IN ('dna', 'protein', 'protein-term', 
> > 'rna')),
> 
> I've changed this but by enumerating all allowed terms so case-mixing 
> within a term isn't allowed. I haven't included 'protein-term' yet; 
> what is this? Is it necessary? What does it denote?
> 
> >
> > This of course will make BioJava work, but will stop BioPerl from 
> > being able to retrieve records correctly as it will not 
> recognise the 
> > upper-case versions of these values. One day hopefuly the 
> two projects 
> > will come up with a resolution to this issue.
> 
> I've changed this in bioperl-db so that a retrieved alphabet term is 
> converted to lower case. (This doesn't make Biojava work with 
> Bioperl-db-inserted data yet though :-)
> 
> >
> > In BS-create-Biosql-usersyns.sql you need to add another 
> command under 
> > the list of set commands at the top. This command should read:
> > set lines 200
> 
> Fixed, thanks for reporting.
> 
> > What this does is to temporarily increase the maximum length of am 
> > output line in Oracle, whilst it is creating the 
> usersyns.sql script. 
> > If you do not do this, the generated script will contain linebreaks 
> > midway through names of tables, which will cause the script to fail.
> >
> > Last of all, unless this has already been fixed in the CVS 
> versions of 
> > BioSQL by the time you read this, there is a section at the end of 
> > BS-grants.sql which grants permissions to the various 
> BioSQL users to 
> > see the SG_User table. The statement currently reads like this:
> > --
> >    -- Biosql grants for SG_USER: needs select on all views 
> and synonyms
> >    -- that don't follow the SG% convention.
> >    --
> >    SELECT 'GRANT SELECT ON ' || object_name || ' TO &biosql_user;'
> >    FROM user_objects
> >    WHERE object_name NOT LIKE 'SG_%'
> >    AND   object_name NOT LIKE '%$%'
> >    AND   object_name NOT LIKE '%_PK_SEQ'
> >    AND   object_type IN ('VIEW','SYNONYM')
> >    ;
> > You need to comment out the line that reads AND   
> object_name NOT LIKE 
> > '%_PK_SEQ' by putting two dashes ( -- ) before it. This allows the 
> > users to see the sequence required to allow them to generate new 
> > records in the database.
> 
> Note that the original statement is correct because SG_USER (or 
> whatever you define biosql_user to be) is supposed to be 
> read-only and 
> should never generate new records in the database. SG_LOADER, or 
> whatever you set biosql_loader to be, is for r/w access and 
> should get 
> proper permissions to the sequences.
> 
> Of course you are free to dispose of the distinction between a 
> read-only and a r/w user for your instance, but I don't think that 
> should be the default ... BTW there is nothing that stops you from 
> defining biosql_user and biosql_loader to the exact same user to 
> achieve this very effect.
> 
> Let me know if I'm missing something here ...
> 
> >
> > Simplified schema
> >
> > The only fix to make here is to do with the maximum value 
> allowed in a 
> > bioentry qualifier. Find the statement that creates the table 
> > BioEntry_Qualifier_Value and alter the definition for the 
> VALUE column 
> > so that it has a maximum size of 300.
> 
> Note that in the standard schema this is a VARCHAR2(4000) meanwhile.
> 
> >
> > Installation
> >
> > Original schema
> >
> > Make sure you have set the $ORACLE_SID environment variable to the 
> > correct database before running the scripts, as they 
> > connect/disconnect several times and if it is not set, you 
> may end up 
> > running them against the wrong database.
> 
> Again, if the roles, user, and tablespace creation steps are 
> commented 
> out there should be no reconnecting. At least theoretically ...
> 
> >
> > The installation requires the creation of three tablespaces 
> - one for 
> > data, one for indexes, one for LOB objects.
> 
> Again note that there is nothing that stops you from defining 
> all three 
> in BS-defs-local to the same tablespace (or two) which already exist. 
> (If you define them to the same it should exist already as the 
> tablespace creation script does assume that they are different.)
> 
> I kind of tried to write it such that you can do it 'complicated' if 
> you want and simple if you don't ... maybe I should have pointed that 
> out better.
> 
> > Decide where you will be keeping the database files for these, and 
> > what you will call the tablespaces. Don't create them yet 
> though, just 
> > write down the names. As always it is good practice to keep 
> the data 
> > and index tablespaces on separate disks to prevent IO 
> bottlenecks, but 
> > you can probably safely put the data and LOB tablespaces on 
> the same 
> > disk.
> >
> > You will also need to decide on names for the two basic roles that 
> > BioSQL uses - the base_user role which contains just enough 
> privileges 
> > to connect to the database, and the schema_creator role, which 
> > contains the privileges required to create database objects in a 
> > schema. Again, don't create them just yet.
> >
> > Now, copy BS-defs.sql to BS-defs-local.sql and edit it. You should 
> > check every entry in it carefully, particularly the names and 
> > locations of the tablespace files to be created, and the 
> names of the 
> > two roles you just decided on above. You will also choose names for 
> > the various default BioSQL roles. biosql_owner is not a 
> role but the 
> > actual owner of the schema that will have the schema_creator role 
> > granted to it, you'll need to define its password here too. 
> > biosql_user is a role to be granted to people who need read-only 
> > access to the BioSQL database, biosql_loader is a role designed for 
> > batch upload processes, whilst biosql_admin has full read-write 
> > permission on the schema.
> 
> I guess I need to update the comments here. I ended up never 
> using the 
> biosql_admin role but using the biosql_loader role instead as the r/w 
> user. This is pretty much how permissions are granted.
> 
> So maybe do I need to include a sample BS-defs-local and 
> BS-create-all 
> with 'simplified' settings?
> 
> 	-hilmar
> 
> >
> > Once you have edited the BS-defs-local.sql script 
> appropriately, you 
> > need to create the two base roles of base_user and schema_creator 
> > manually. Create them by running something similar to the following 
> > script whilst logged in as sysdba, from inside the biosql-ora 
> > directory:
> > @BS-defs-local
> >    create role &base_user;
> >    grant
> >    CREATE SESSION,
> >    CREATE SYNONYM,
> >    CREATE VIEW
> >    to &base_user;
> >    create role &schema_creator;
> >    grant
> >    CREATE PROCEDURE,
> >    CREATE ROLE,
> >    CREATE SEQUENCE,
> >    CREATE SESSION,
> >    CREATE SYNONYM,
> >    CREATE TRIGGER,
> >    CREATE TYPE,
> >    CREATE VIEW,
> >    CREATE TABLE
> >    to &schema_creator
> >    with admin option;
> >
> > If you want some basic users set up, edit the BS-create-users.sql 
> > script to look at the sample users it will create for you 
> > automatically. If you don't want them, or want different 
> names etc., 
> > comment them out or edit them.
> >
> > The final stage before actual installation is to edit the 
> > BS-create-all.sql script to ensure that only the steps you 
> require are 
> > carried out. If you already have predefined tablespaces and 
> don't want 
> > it to create new ones, comment out the line that reads 
> > @BS-create-tablespaces . Likewise if you don't want any 
> default data 
> > loaded into the database, comment out the line near the end 
> that reads 
> > @BS-prepopulate-db .
> >
> > Under section 8 of BS-create-all.sql you need to make sure the 
> > following commands appear in the order below. If they appear in any 
> > other order, you will not be able to create other users to 
> access the 
> > database later! The commands should read:
> > @BS-create-roles
> >    @BS-create-synonyms
> >    @BS-create-Biosql-API2
> >    @BS-create-Biosql-usersyns
> >    @BS-grants
> > (NOTE: The BS-create-Biosql-API2 script is an alternative to 
> > BS-create-Biosql-API which works much better with BioJava. This is 
> > because BioJava has no flexibility about column names in 
> tables. The 
> > API2 version of the script ensures that the column names 
> are exactly 
> > the same as what BioJava expects by using synonyms. But, no matter 
> > which you run, everything will still work fine with BioPerl).
> >
> > Now, log in to the database as sysdba from inside the biosql-ora 
> > directory. Create the BioSQL database by typing:
> > @BS-create-all
> > . You might want to spool the output to see what happens, 
> but you'll 
> > find that half of it doesn't appear in the spool file, 
> because BioSQL 
> > is using spool itself to generate dynamic scripts on the fly. If 
> > you've done everything right, the only messages you should 
> get are a 
> > few Table or view does not exist style messages, referring to the 
> > attempts by the script to drop old objects before 
> recreating new ones.
> >
> > During installation you will be prompted for the sysdba 
> username and 
> > password several times. This is required to create tablespaces and 
> > users.
> >
> > If something goes wrong, you can safely rerun the script without 
> > dropping anything first as it will drop the database 
> objects from the 
> > previous attempt first. It will however leave behind the 
> tablespaces, 
> > users, and roles. You can always just drop the users and 
> tablespaces 
> > that have been created if it really messes up, and start again from 
> > scratch.
> >
> > Now,  your database has been installed! The only remaining 
> step is to 
> > log in to each user who will be using BioSQL, and run the 
> usersyns.sql 
> > script that the installation generated for you in the biosql-ora 
> > directory. This script creates the synonyms for the BioSQL 
> objects and 
> > allows the users to see them. This script should not have 
> any errors 
> > at all. If it does, edit it and check it closely for things like 
> > misplaced linebreaks etc.
> >
> > Note that Oracle sometimes has issues with roles and does not 
> > apparently grant them correctly. If this happens, you will need to 
> > grant the appropriate roles to the individual users 
> manually (see the 
> > short create role script above) and rerun the usersyns.sql script. 
> > Sometimes you will find they don't even have the appropriate 
> > tablespace quotas on the three BioSQL tablespaces. You'll need to 
> > grant these tablespace quotas using the alter user <bloggs> quota 
> > unlimited on <tablespace> command.
> >
> > Simplified schema
> >
> > NOTE: You will have to do a global search-and-replace on 
> this script 
> > to replace the two tablespace names with the ones you will 
> actually be 
> > using. Check with your DBA. This version of the schema only has two 
> > tablespaces - one for data, the other for indexes.
> >
> > This is much easier to set up than the Original schema. 
> Simply log in 
> > as the user you wish to install BioSQL as, ensure that your DBA has 
> > granted that user the same rights as for the schema_creator role 
> > described in the Original installation instructions above, then 
> > execute the single script that defines the schema. You 
> should have no 
> > problems. You can spool the output to a file if you like to 
> be able to 
> > check the results.
> >
> > This schema is a one-user-only schema, where all users log 
> in as the 
> > schema owner and have full read/write access to the entire 
> database. 
> > This is the most important difference between this schema and the 
> > Original .
> >
> > Testing
> >
> > Any BioJava script should work fine!
> >
> > THE END!
> >
> > Richard Holland, hollandr at gis dot a-star dot edu dot sg, 
> December 
> > 2004
> 
> On Sunday, April 17, 2005, at 06:07  PM, Richard HOLLAND wrote:
> 
> > The only issues I have are with the Oracle installation, 
> which I came
> > across whilst writing the Oracle BioSQL howto at
> > 
> http://www.biojava.org/docs/bj_in_anger/bj_and_bsql_oracle_howto.htm -
> > the issues are mentioned in that article. If they have been 
> resolved or
> > are no longer relevant, then I'd consider it ready for release.
> >
> > However as part of the release I'd really appreciate a document
> > describing exactly what is supposed to be stored in each 
> column/table
> > (just supposed to be - doesn't have to be the way any 
> particular Bio*
> > project actually does it). This would be very helpful in 
> the efforts to
> > unite the various Bio* projects and make them all use the 
> same tables
> > for the same things (which is not always the case at present).
> >
> > cheers,
> > Richard
> >
> > Richard Holland
> > Bioinformatics Specialist
> > GIS extension 8199
> > ---------------------------------------------
> > This email is confidential and may be privileged. If you are not the
> > intended recipient, please delete it and notify us 
> immediately. Please
> > do not copy or use it for any purpose, or disclose its 
> content to any
> > other person. Thank you.
> > ---------------------------------------------
> >
> >
> >> -----Original Message-----
> >> From: biosql-l-bounces at portal.open-bio.org
> >> [mailto:biosql-l-bounces at portal.open-bio.org] On Behalf Of 
> Hilmar Lapp
> >> Sent: Sunday, April 17, 2005 4:31 AM
> >> To: Biosql
> >> Subject: [BioSQL-l] release preparation
> >>
> >>
> >> I've issued this call earlier and I believe have implemented all
> >> suggestions. To be sure, please let me know if you have any
> >> issues with
> >> the schema or instantiation or if you know of any that should be
> >> addressed before releasing 1.0.
> >>
> >> Other than that Brian has updated the PostgreSQL generated ERD HTML
> >> document so that everything should be up to date and ready to go.
> >>
> >> So please let me know and otherwise I'll target release for
> >> the end of
> >> this month.
> >>
> >> 	-hilmar
> >> -- 
> >> -------------------------------------------------------------
> >> Hilmar Lapp                            email: lapp at gnf.org
> >> GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
> >> -------------------------------------------------------------
> >>
> >> _______________________________________________
> >> BioSQL-l mailing list
> >> BioSQL-l at open-bio.org
> >> http://open-bio.org/mailman/listinfo/biosql-l
> >>
> >
> >
> -- 
> -------------------------------------------------------------
> Hilmar Lapp                            email: lapp at gnf.org
> GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
> -------------------------------------------------------------
> 
> 
> 



More information about the BioSQL-l mailing list