[BioSQL-l] release preparation

Richard HOLLAND hollandr at gis.a-star.edu.sg
Mon Apr 18 21:45:33 EDT 2005


Don't worry, I do know how to do it, it's just that in the existing
BioJava-live code it hasn't been done, and I'll need to be careful to
add the usual checks to see if we're using Oracle or not before choosing
the appropriate SQL to update biosequence with.

CLOBs under 4000 chars are certainly easier, but over 4000 you have to
be careful, and there is a bug which prevents clob.getSubstring()
working for any position greater than can be described in 16 bits
(although I know I experienced this one before, I can't find a reference
to it now....). You then have to use the clob's Stream accessors
instead, but it's not a problem really. Yes, I know 16-bits (65k bases
or so) is huge, but in our current BioSQL all our sequences are around
the 10000 base length so the 4000-char limited accessor methods are not
an option.

Len's suggestion of having table helpers in BioJava to check which
version to use and therefore maintain backwards compatibility is a good
one. It's slightly more work, but not too much to warrant a major panic
attack. I'll let you know when biojava-live has been updated.

cheers,
Richard

Richard Holland
Bioinformatics Specialist
GIS extension 8199
---------------------------------------------
This email is confidential and may be privileged. If you are not the
intended recipient, please delete it and notify us immediately. Please
do not copy or use it for any purpose, or disclose its content to any
other person. Thank you.
---------------------------------------------


> -----Original Message-----
> From: Hilmar Lapp [mailto:hlapp at gmx.net] 
> Sent: Tuesday, April 19, 2005 12:56 AM
> To: Richard HOLLAND
> Cc: Len Trigg; mark.schreiber at novartis.com; Biosql; biojava-list List
> Subject: Re: [BioSQL-l] release preparation
> 
> 
> CLOB is IMHO actually easier to handle. Also, LONG is really odd to 
> deal with in SQL whereas the Oracle server will nicely on-the-fly 
> convert strings to CLOB and vice versa so long as they are 
> shorter than 
> 4000 chars. Some of the type-generic functions that come with Oracle 
> will not accept LONG but do accept CLOB. Just as another anecdotal 
> piece, the built-in BLAST searcher available in Oracle 10g expects a 
> cursor returning CLOBs, not LONGs.
> 
> With the java.sql.Clob interface to get at the full value as a string 
> is as simple as
> 
> 	Clob clob = resultSet.getClob(<your column index here>);
> 	String clobValue = clob.getSubString(0, clob.length());
> 
> Inserting a new value in reality is a two-step process:
> 
> 	PreparedStatement pst = conn.prepareStatement("INSERT 
> INTO Biosequence 
> (Bioentry_Id, Seq) VALUES (?, EMPTY_CLOB())");
> 	pst.executeUpdate(idValue);
> 	pst = conn.prepareStatement("SELECT Seq FROM Biosequence WHERE 
> Bioentry_Id = ? FOR UPDATE", ResultSet.TYPE_FORWARD_ONLY, 
> ResultSet.CONCUR_UPDATABLE);
> 
> 	ResultSet rs = pst.executeQuery(idValue);
> 	Clob clob = rs.getClob(1);
> 	clob.setString(0, theSeq);
> 	// not sure this is necessary
> 	rs.updateClob(1, clob);
> 	rs.close();
> 	// don't forget to release lock
> 	conn.commit();
> 	
> I vaguely remember that Len or somebody else from the Biojava 
> crowd had 
> this all figured out?
> 
> 	-hilmar
> 
> On Monday, April 18, 2005, at 02:08  AM, Richard HOLLAND wrote:
> 
> > I looked into this in a bit more detail earlier today and 
> found that,
> > since some version of Oracle around the 9i point in time, 
> the official
> > Oracle JDBC driver API for accessing LOBs in changed. This 
> means that
> > whereas before the same code could be used in BioJava to access both
> > Hilmar's and Len's versions of the database, since the 9i 
> drivers this
> > has no longer been possible, and BioJava only works with 
> Len's version.
> > The problem is due to the way in which Len's schema uses 
> LONG values 
> > for
> > biosequence.seq, but Hilmar's uses CLOBs.
> >
> > (The nitty gritty - before 9i, Oracle JDBC allowed you to 
> access both
> > LONG and CLOB columns using getString()/setString() methods to
> > manipulate them. Now, these methods only work with LONG 
> columns, and 
> > you
> > have to do fancy tricks to get anything useful into/out of CLOBs).
> >
> > After discussing this with Mark earlier this afternoon, I 
> am planning 
> > on
> > changing BioJava to use the new Oracle CLOB API, at which 
> point it will
> > no longer work with schemas set up using Len's version. No change to
> > BioSQL is required. This, from a BioJava point of view, 
> would make the
> > simple schema redundant. I am not sure if there are people 
> in the other
> > Bio* projects who use the simple schema though so we probably can't 
> > just
> > drop it.
> >
> > Are there any objections? I have crossposted this to the 
> BioJava list 
> > to
> > make sure everyone who might be affected gets a say.
> >
> > cheers,
> > Richard
> >
> > Richard Holland
> > Bioinformatics Specialist
> > GIS extension 8199
> > ---------------------------------------------
> > This email is confidential and may be privileged. If you are not the
> > intended recipient, please delete it and notify us 
> immediately. Please
> > do not copy or use it for any purpose, or disclose its 
> content to any
> > other person. Thank you.
> > ---------------------------------------------
> >
> >
> >> -----Original Message-----
> >> From: biosql-l-bounces at portal.open-bio.org
> >> [mailto:biosql-l-bounces at portal.open-bio.org] On Behalf Of 
> Len Trigg
> >> Sent: Monday, April 18, 2005 4:58 PM
> >> To: mark.schreiber at novartis.com
> >> Cc: Hilmar Lapp; Biosql
> >> Subject: Re: [BioSQL-l] release preparation
> >>
> >>
> >>
> >> Mark Schreiber wrote:
> >>> now, my bad! Agreed that from a SQL query perspective the
> >> schemas are the
> >>> same, one just has more complexity (if I can call it that)
> >> under the hood.
> >>
> >> Indeed, the complexity is more to do with the complexity 
> of installing
> >> and understanding what's going on in all those files :-) 
> (particularly
> >> if you are not an oracle expert and have only been looking at the
> >> BioSQL schemas for the other supported databases), and that's why I
> >> did the simple version.  That's partly confirmed by the 
> fact that the
> >> bjia description of how to use the original schema is 
> about 8KB, while
> >> the description for the simple schema is about 1KB.  I'm all for
> >> dumping the simple one if the barrier for entry for the original
> >> schema is lowered (maybe it already has been).
> >>
> >>
> >>> I would prefer to keep instructions for the less complex
> >> version up for
> >>> the time being as we are having difficulties getting
> >> biojava to work
> >>> seamlessly with the more complex version. This is almost
> >> certainly a
> >>> failing of biojava for which the oracle support seems to have been
> >>> compiled against the 'simple' schema not the 'complex schema'.
> >>
> >> It certainly was only tested against the simple version, because
> >> that's the only schema I had working when I wrote the 
> Oracle support.
> >> I am a little surprised that you are having major 
> difficulties though,
> >> since the original package has a compatibility layer that 
> (supposedly)
> >> presents the same schema as the simple version.
> >>
> >>
> >>> I expect we will soon have biojava supporting your version
> >> and we can drop
> >>> the 'simple' schema. After all, there is not much point
> >> using oracle if
> >>> you don't make use of the features.
> >>
> >> In my case, it was a matter of using Oracle because that 
> was what was
> >> already installed :-)
> >>
> >>
> >> Cheers,
> >> Len.
> >>
> >> _______________________________________________
> >> BioSQL-l mailing list
> >> BioSQL-l at open-bio.org
> >> http://open-bio.org/mailman/listinfo/biosql-l
> >>
> >
> >
> -- 
> -------------------------------------------------------------
> Hilmar Lapp                            email: lapp at gnf.org
> GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
> -------------------------------------------------------------
> 
> 
> 



More information about the BioSQL-l mailing list