[BioSQL-l] release preparation

Mon Apr 18 22:19:30 EDT 2005

Sounds good. BTW note that 65k is not large at all even on the 
transcript level; the infamous titin and some similar genes have longer 
transcripts. Maybe Dengue doesn't have titin but still ... ;)

	-hilmar

On Apr 18, 2005, at 6:45 PM, Richard HOLLAND wrote:

> Don't worry, I do know how to do it, it's just that in the existing
> BioJava-live code it hasn't been done, and I'll need to be careful to
> add the usual checks to see if we're using Oracle or not before 
> choosing
> the appropriate SQL to update biosequence with.
>
> CLOBs under 4000 chars are certainly easier, but over 4000 you have to
> be careful, and there is a bug which prevents clob.getSubstring()
> working for any position greater than can be described in 16 bits
> (although I know I experienced this one before, I can't find a 
> reference
> to it now....). You then have to use the clob's Stream accessors
> instead, but it's not a problem really. Yes, I know 16-bits (65k bases
> or so) is huge, but in our current BioSQL all our sequences are around
> the 10000 base length so the 4000-char limited accessor methods are not
> an option.
>
> Len's suggestion of having table helpers in BioJava to check which
> version to use and therefore maintain backwards compatibility is a good
> one. It's slightly more work, but not too much to warrant a major panic
> attack. I'll let you know when biojava-live has been updated.
>
> cheers,
> Richard
>
> Richard Holland
> Bioinformatics Specialist
> GIS extension 8199
> ---------------------------------------------
> This email is confidential and may be privileged. If you are not the
> intended recipient, please delete it and notify us immediately. Please
> do not copy or use it for any purpose, or disclose its content to any
> other person. Thank you.
> ---------------------------------------------
>
>
>> -----Original Message-----
>> From: Hilmar Lapp [mailto:hlapp at gmx.net]
>> Sent: Tuesday, April 19, 2005 12:56 AM
>> To: Richard HOLLAND
>> Cc: Len Trigg; mark.schreiber at novartis.com; Biosql; biojava-list List
>> Subject: Re: [BioSQL-l] release preparation
>>
>>
>> CLOB is IMHO actually easier to handle. Also, LONG is really odd to
>> deal with in SQL whereas the Oracle server will nicely on-the-fly
>> convert strings to CLOB and vice versa so long as they are
>> shorter than
>> 4000 chars. Some of the type-generic functions that come with Oracle
>> will not accept LONG but do accept CLOB. Just as another anecdotal
>> piece, the built-in BLAST searcher available in Oracle 10g expects a
>> cursor returning CLOBs, not LONGs.
>>
>> With the java.sql.Clob interface to get at the full value as a string
>> is as simple as
>>
>> 	Clob clob = resultSet.getClob(<your column index here>);
>> 	String clobValue = clob.getSubString(0, clob.length());
>>
>> Inserting a new value in reality is a two-step process:
>>
>> 	PreparedStatement pst = conn.prepareStatement("INSERT
>> INTO Biosequence
>> (Bioentry_Id, Seq) VALUES (?, EMPTY_CLOB())");
>> 	pst.executeUpdate(idValue);
>> 	pst = conn.prepareStatement("SELECT Seq FROM Biosequence WHERE
>> Bioentry_Id = ? FOR UPDATE", ResultSet.TYPE_FORWARD_ONLY,
>> ResultSet.CONCUR_UPDATABLE);
>>
>> 	ResultSet rs = pst.executeQuery(idValue);
>> 	Clob clob = rs.getClob(1);
>> 	clob.setString(0, theSeq);
>> 	// not sure this is necessary
>> 	rs.updateClob(1, clob);
>> 	rs.close();
>> 	// don't forget to release lock
>> 	conn.commit();
>> 	
>> I vaguely remember that Len or somebody else from the Biojava
>> crowd had
>> this all figured out?
>>
>> 	-hilmar
>>
>> On Monday, April 18, 2005, at 02:08  AM, Richard HOLLAND wrote:
>>
>>> I looked into this in a bit more detail earlier today and
>> found that,
>>> since some version of Oracle around the 9i point in time,
>> the official
>>> Oracle JDBC driver API for accessing LOBs in changed. This
>> means that
>>> whereas before the same code could be used in BioJava to access both
>>> Hilmar's and Len's versions of the database, since the 9i
>> drivers this
>>> has no longer been possible, and BioJava only works with
>> Len's version.
>>> The problem is due to the way in which Len's schema uses
>> LONG values
>>> for
>>> biosequence.seq, but Hilmar's uses CLOBs.
>>>
>>> (The nitty gritty - before 9i, Oracle JDBC allowed you to
>> access both
>>> LONG and CLOB columns using getString()/setString() methods to
>>> manipulate them. Now, these methods only work with LONG
>> columns, and
>>> you
>>> have to do fancy tricks to get anything useful into/out of CLOBs).
>>>
>>> After discussing this with Mark earlier this afternoon, I
>> am planning
>>> on
>>> changing BioJava to use the new Oracle CLOB API, at which
>> point it will
>>> no longer work with schemas set up using Len's version. No change to
>>> BioSQL is required. This, from a BioJava point of view,
>> would make the
>>> simple schema redundant. I am not sure if there are people
>> in the other
>>> Bio* projects who use the simple schema though so we probably can't
>>> just
>>> drop it.
>>>
>>> Are there any objections? I have crossposted this to the
>> BioJava list
>>> to
>>> make sure everyone who might be affected gets a say.
>>>
>>> cheers,
>>> Richard
>>>
>>> Richard Holland
>>> Bioinformatics Specialist
>>> GIS extension 8199
>>> ---------------------------------------------
>>> This email is confidential and may be privileged. If you are not the
>>> intended recipient, please delete it and notify us
>> immediately. Please
>>> do not copy or use it for any purpose, or disclose its
>> content to any
>>> other person. Thank you.
>>> ---------------------------------------------
>>>
>>>
>>>> -----Original Message-----
>>>> From: biosql-l-bounces at portal.open-bio.org
>>>> [mailto:biosql-l-bounces at portal.open-bio.org] On Behalf Of
>> Len Trigg
>>>> Sent: Monday, April 18, 2005 4:58 PM
>>>> To: mark.schreiber at novartis.com
>>>> Cc: Hilmar Lapp; Biosql
>>>> Subject: Re: [BioSQL-l] release preparation
>>>>
>>>>
>>>>
>>>> Mark Schreiber wrote:
>>>>> now, my bad! Agreed that from a SQL query perspective the
>>>> schemas are the
>>>>> same, one just has more complexity (if I can call it that)
>>>> under the hood.
>>>>
>>>> Indeed, the complexity is more to do with the complexity
>> of installing
>>>> and understanding what's going on in all those files :-)
>> (particularly
>>>> if you are not an oracle expert and have only been looking at the
>>>> BioSQL schemas for the other supported databases), and that's why I
>>>> did the simple version.  That's partly confirmed by the
>> fact that the
>>>> bjia description of how to use the original schema is
>> about 8KB, while
>>>> the description for the simple schema is about 1KB.  I'm all for
>>>> dumping the simple one if the barrier for entry for the original
>>>> schema is lowered (maybe it already has been).
>>>>
>>>>
>>>>> I would prefer to keep instructions for the less complex
>>>> version up for
>>>>> the time being as we are having difficulties getting
>>>> biojava to work
>>>>> seamlessly with the more complex version. This is almost
>>>> certainly a
>>>>> failing of biojava for which the oracle support seems to have been
>>>>> compiled against the 'simple' schema not the 'complex schema'.
>>>>
>>>> It certainly was only tested against the simple version, because
>>>> that's the only schema I had working when I wrote the
>> Oracle support.
>>>> I am a little surprised that you are having major
>> difficulties though,
>>>> since the original package has a compatibility layer that
>> (supposedly)
>>>> presents the same schema as the simple version.
>>>>
>>>>
>>>>> I expect we will soon have biojava supporting your version
>>>> and we can drop
>>>>> the 'simple' schema. After all, there is not much point
>>>> using oracle if
>>>>> you don't make use of the features.
>>>>
>>>> In my case, it was a matter of using Oracle because that
>> was what was
>>>> already installed :-)
>>>>
>>>>
>>>> Cheers,
>>>> Len.
>>>>
>>>> _______________________________________________
>>>> BioSQL-l mailing list
>>>> BioSQL-l at open-bio.org
>>>> http://open-bio.org/mailman/listinfo/biosql-l
>>>>
>>>
>>>
>> -- 
>> -------------------------------------------------------------
>> Hilmar Lapp                            email: lapp at gnf.org
>> GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
>> -------------------------------------------------------------
>>
>>
>>
>
> _______________________________________________
> BioSQL-l mailing list
> BioSQL-l at open-bio.org
> http://open-bio.org/mailman/listinfo/biosql-l
>
-- 
-------------------------------------------------------------
Hilmar Lapp                            email: lapp at gnf.org
GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
-------------------------------------------------------------