[BioSQL-l] value field in seqfeature_qualifier_value too small

Hilmar Lapp hlapp at gmx.net
Wed Jun 7 17:15:49 UTC 2006


You may make this change in your installation, and I should maybe  
document this possibility in the schema.

However, there are some serious issues with doing so. First off, you  
will no longer be able to search by simple query (you'd have to use  
DBMS_LOB functions), and you can't index the column any more.  
Furthermore, all language bindings will need special code to use the  
stream or other LOB API for getting/setting the value. There's  
additional issues that LOB columns raise compared to VARCHAR types.

It's worth noting that this applies so far only to Oracle (the other  
RDBMSs do not impose the length restriction), and it's in reality  
only the value of the protein translation feature tag that exceeds  
the limit on occasion.

I personally strip the protein translation tag from all loaded  
Genbank and RefSeq records because the value may be computed easily  
on the fly, and the respective sequences are usually there too unless  
you don't load Genpept or the NP_/XP_* part of RefSeq - but in this  
case I don't understand why you would want the tag value but not the  
full sequence.

Not the easy answer you may have hoped for, and I'd welcome any  
suggestions for how the problem could be solved w/o using CLOBs.

	-hilmar

On Jun 7, 2006, at 11:45 AM, David Scott wrote:

> in seqfeature_qualifier_value:
>
>   SEQFEATURE_ID                             NOT NULL NUMBER(38)
>   TERM_ID                                   NOT NULL NUMBER(38)
>   RANK                                      NOT NULL NUMBER(3)
>   VALUE                                              VARCHAR2(4000)
>
>  the value field is restricted to 4000 bytes - 4000 bytes too small to
>  carry current genbank protein translations - e.g. AB016240.1 has a
>  protein translation of 4133 characters. since 4000 bytes is a string
>  length restriction for some jdbc drivers - to increase the field  
> size,
>  the field type should also be changed to clob.
>
> _______________________________________________
> BioSQL-l mailing list
> BioSQL-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biosql-l
>

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================








More information about the BioSQL-l mailing list