[Bioperl-l] Re: memory error while loading SwissProt into Oracle using bioperl-db

Hilmar Lapp hlapp at gnf.org
Tue Jun 14 14:55:38 EDT 2005


On Jun 14, 2005, at 2:52 AM, Jana Bauckmann wrote:

> Hi,
>
> I would like to load SwissProt data into my Oracle 9.2 database with
> BioSQL as schema using load_seqdatabase.pl from bioperl-db. I've got 
> two
> problems:
>
> 1) I get many (about 1300) warnings stating integrity constraint 
> errors:
>
> ORA-02291: integrity constraint (BIOSQL_SP.FKDBX_REF) violated - parent
> key not found (DBD ERROR: OCIStmtExecute)
>
> ORA-01400: cannot insert NULL into 
> ("BIOSQL_SP"."SG_REFERENCE"."AUTHORS")
> (DBD ERROR: OCIStmtExecute)

If there is indeed no authors for the respective reference in the 
respective SwissProt entries then this is expected because 
Reference.Authors may not be NULL.

You should, however, see more than just the error message above; 
supposedly there is a warning message following or preceding it that 
informs about not all foreign keys succeeded to insert, and the message 
should give the primary key. This should be the primary key for the 
bioentry that should have gotten the reference attached. Using SQL you 
should then be able to identify which record it is and then you can 
look it up on the Swissprot site or in your Swissprot source file.

If the bioentry itself fails to load because of this problem then you 
should see an error message to this effect, with full stack trace. 
Otherwise the bioentry did load, just the reference didn't, and if you 
don't really need this particular reference, you don't need to worry 
about it.

You may also want to consider trying to upgrade to a CVS snapshot from 
either the 1.4 branch or the main trunk. There have been a few fixes to 
modules that I believe include the swissprot parser.

>
> 2) The script stops after 2 hours (34500 tuples in table BioEntry) with
> message: Out of memory!
>
> I guess problem 1 causes problem 2. Is this reasonable or do I have two
> separated problems?

The one before may not even be a real problem, see above. It is 
extremely unlikely that it causes the memory problem.

Swissprot is is a large, very diverse, and richly annotated data 
source, and because bioperl-db caches a lot of stuff like ontology 
terms, references, and dbxrefs the loader process will eventually use 
up anywhere between 500MB and 1.3GB of RAM.

Given the amount of memory you have this shouldn't be a limitation 
though at all, unless maybe if you gave all the memory to Oracle 
running on the same machine.

I've had a memory leak issue with DBD::Oracle, the Oracle 9iR2 client 
library, and multi-thread enabled perl 5.8.1 on MacOSX. You may be 
seeing a similar problem. Try watching the loader process in top and 
see how fast the memory consumption grows. It will grow due to the 
object cache filling up, but if you see it eating up more than 1GB 
before 100,000 records loaded you're likely to have hit a memory leak.

If that's the case you'll have to rebuild your own perl from source 
with multi-threading disabled.

	-hilmar

>
> I run Oracle and the load script on the same machine with:
> Suse Linux 9.0 (kernel 2.4.21-291-smp) with  12 GB RAM
> perl 5.8.1, built for i586-linux-thread-multi
> bioperl 1.4
> bioperl-db 0.1

BTW I'm assuming this is not correct; otherwise the latest BioSQL 
schema wouldn't be supported, let alone the Oracle version of it. You 
probably obtained a snapshot from CVS?

> DBI 1.48
> DBD::Oracle 1.16
> Oracle 9.2
> BioSQL schema for Oracle (downloaded from http://cvs.open-bio.org/ on 
> 6th
> June 2005)
>
> Thanks for any suggestions,
> Jana
>
> _______________________________________________
> BioSQL-l mailing list
> BioSQL-l at open-bio.org
> http://open-bio.org/mailman/listinfo/biosql-l
>
-- 
-------------------------------------------------------------
Hilmar Lapp                            email: lapp at gnf.org
GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
-------------------------------------------------------------



More information about the Bioperl-l mailing list