[BioSQL-l] Loading long strings to Oracle

Tue Jun 28 12:15:41 EDT 2005

On Jun 28, 2005, at 11:19 AM, Teemu Kivioja wrote:

> Hi,
>
>
> It seems that I can get rid of this error message by explicitly  
> telling that the type of sequence field is CLOB by adding the code
> 	    if ($slots[$i] eq "seq") {
> 	      $self->bind_param($sth, $j, $slotvals->[$i],
> 				{ ora_type => ORA_CLOB });
> 	    } else {
> 	      $self->bind_param($sth, $j, $slotvals->[$i]);
> 	    }

I've used that type of code before, but it wasn't necessary any more  
for INSERTs and didn't solve the problem for UPDATEs.

This may be related to the character encoding problem, in particular  
the fact that with UTF16 the byte length is no longer equal to the  
string length.

> 2.) When trying insert InterPro (interpro 10.0
> ftp://ftp.ebi.ac.uk/pub/databases/interpro/interpro.xml) I get:
>
> perl load_ontology.pl --format 'interpro' --host sboracle1.ad.vtt.fi  
> --namespace interpro  --driver Oracle --dbname BfxDB --testonly  
> --fmtargs "ontology_engine,simple" interpro.xml

Don't worry about the ontology engine. Use format interprosax, which is  
an alias to an event-based parser, that should keep the memory usage  
down.

> ...
> 11900
> Loading ontology InterPro:
>         ... terms
>
> -------------------- WARNING ---------------------
> MSG: insert in Bio::DB::BioSQL::TermAdaptor (driver) failed, values  
> were ("IPR000911","Ribosomal protein L11","Ribosomes are ...
>
> ORA-01461: can bind a LONG value only for insert into a LONG column  
> (DBD ERROR: error possibly near <*> indicator at char 14 in 'INSERT  
> INTO te<*>rm (identifier, name, definition, is_obsolete, ont_oid)  
> VALUES (:p1, :p2, :p3, :p4, :p5)')
> ---------------------------------------------------
> Could not store term IPR000911, name 'Ribosomal protein L11':
>
> ------------- EXCEPTION  -------------
> MSG: create: object (Bio::Ontology::InterProTerm) failed to insert or  
> to be found by unique key
>
> Again, the annotation is >2000 characters long but well under the 4000  
> character limit.

This may be due to the encoding problem as well as you suspect yourself.

	-hilmar

>
> 3.) As others have already reported, the memory usage can be high,   
> the above load_ontology process takes about 2.5GB of memory.
>
> I guess the fact that the problems 1 and 2 already arise with strings  
> that are <4000 chars long might be related to the local character  
> coding. The code:
>
> my $hash_ref = $dbh->ora_nls_parameters();
> my $database_charset = $hash_ref->{NLS_CHARACTERSET};	
> my $national_charset = $hash_ref->{NLS_NCHAR_CHARACTERSET};
> print "database charset: $database_charset\n";
> print "national charset: $national_charset\n";
>
> gives
>
> database charset: WE8ISO8859P1
> national charset: AL16UTF16
>
> and
>
> $  locale LC_CTYPE | head
> upper;lower;alpha;digit;xdigit;space;print;graph;blank;cntrl;punct; 
> alnum;combining;combining_level3
> toupper;tolower;totitle
> 16
> 6
> UTF-8
> 70
> 84
> 1
> 0
> 1
>
> Some details of the system:
> Enterprise Linux, 2.4.21-32.0.1.ELsmp (64-bit)
> Oracle 10g, version 10.1.0.3.0 - 64bit
> Perl, v5.8.0 built for x86_64-linux-thread-multi
> Bioperl 1.4
> bioperl-db 0.1
> DBD::Oracle 1.16
> Biosql-schema downloaded on May 10
>
> What would be the best way to solve these problems?
>
> Best regards,
> Teemu Kivioja
>
>
>
>
> ------------------------------------------------------------------
> Teemu Kivioja, Research Scientist
> VTT Biotechnology
> P.O. Box 1500, FIN-02044 VTT, Finland
> (Street address: Tietotie 2, Espoo, Otaniemi)
> Email: Teemu.Kivioja at vtt.fi
> Phone: +358 20 722 7111
> Fax: +358 20 722 7071
>
> _______________________________________________
> BioSQL-l mailing list
> BioSQL-l at open-bio.org
> http://open-bio.org/mailman/listinfo/biosql-l
>
-- 
-------------------------------------------------------------
Hilmar Lapp                            email: lapp at gnf.org
GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
-------------------------------------------------------------