[Bioperl-l] Re: memory error while loading SwissProt into Oracle
using bioperl-db
Hilmar Lapp
hlapp at gnf.org
Tue Jun 14 14:55:38 EDT 2005
On Jun 14, 2005, at 2:52 AM, Jana Bauckmann wrote:
> Hi,
>
> I would like to load SwissProt data into my Oracle 9.2 database with
> BioSQL as schema using load_seqdatabase.pl from bioperl-db. I've got
> two
> problems:
>
> 1) I get many (about 1300) warnings stating integrity constraint
> errors:
>
> ORA-02291: integrity constraint (BIOSQL_SP.FKDBX_REF) violated - parent
> key not found (DBD ERROR: OCIStmtExecute)
>
> ORA-01400: cannot insert NULL into
> ("BIOSQL_SP"."SG_REFERENCE"."AUTHORS")
> (DBD ERROR: OCIStmtExecute)
If there is indeed no authors for the respective reference in the
respective SwissProt entries then this is expected because
Reference.Authors may not be NULL.
You should, however, see more than just the error message above;
supposedly there is a warning message following or preceding it that
informs about not all foreign keys succeeded to insert, and the message
should give the primary key. This should be the primary key for the
bioentry that should have gotten the reference attached. Using SQL you
should then be able to identify which record it is and then you can
look it up on the Swissprot site or in your Swissprot source file.
If the bioentry itself fails to load because of this problem then you
should see an error message to this effect, with full stack trace.
Otherwise the bioentry did load, just the reference didn't, and if you
don't really need this particular reference, you don't need to worry
about it.
You may also want to consider trying to upgrade to a CVS snapshot from
either the 1.4 branch or the main trunk. There have been a few fixes to
modules that I believe include the swissprot parser.
>
> 2) The script stops after 2 hours (34500 tuples in table BioEntry) with
> message: Out of memory!
>
> I guess problem 1 causes problem 2. Is this reasonable or do I have two
> separated problems?
The one before may not even be a real problem, see above. It is
extremely unlikely that it causes the memory problem.
Swissprot is is a large, very diverse, and richly annotated data
source, and because bioperl-db caches a lot of stuff like ontology
terms, references, and dbxrefs the loader process will eventually use
up anywhere between 500MB and 1.3GB of RAM.
Given the amount of memory you have this shouldn't be a limitation
though at all, unless maybe if you gave all the memory to Oracle
running on the same machine.
I've had a memory leak issue with DBD::Oracle, the Oracle 9iR2 client
library, and multi-thread enabled perl 5.8.1 on MacOSX. You may be
seeing a similar problem. Try watching the loader process in top and
see how fast the memory consumption grows. It will grow due to the
object cache filling up, but if you see it eating up more than 1GB
before 100,000 records loaded you're likely to have hit a memory leak.
If that's the case you'll have to rebuild your own perl from source
with multi-threading disabled.
-hilmar
>
> I run Oracle and the load script on the same machine with:
> Suse Linux 9.0 (kernel 2.4.21-291-smp) with 12 GB RAM
> perl 5.8.1, built for i586-linux-thread-multi
> bioperl 1.4
> bioperl-db 0.1
BTW I'm assuming this is not correct; otherwise the latest BioSQL
schema wouldn't be supported, let alone the Oracle version of it. You
probably obtained a snapshot from CVS?
> DBI 1.48
> DBD::Oracle 1.16
> Oracle 9.2
> BioSQL schema for Oracle (downloaded from http://cvs.open-bio.org/ on
> 6th
> June 2005)
>
> Thanks for any suggestions,
> Jana
>
> _______________________________________________
> BioSQL-l mailing list
> BioSQL-l at open-bio.org
> http://open-bio.org/mailman/listinfo/biosql-l
>
--
-------------------------------------------------------------
Hilmar Lapp email: lapp at gnf.org
GNF, San Diego, Ca. 92121 phone: +1-858-812-1757
-------------------------------------------------------------
More information about the Bioperl-l
mailing list