[Bioperl-l] Interpro parsing problems - and solutions
Juguang Xiao
juguang at tll.org.sg
Tue Jan 6 22:11:19 EST 2004
Hi Richard,
I have found the similar problem before, and my solution is to write
the Bio/OntologyIO/Handlers/InterPro_BioSQL_Handler.pm for loading the
InterPro into BioSQL database, since the serious user of InterPro will
load the whole db, rather than the small piece in the test. When you
load the InterPro into memory, you will found 1) your 1 GB virtual
memory will be occupied in with Mac OSX, so that you cannot even do any
further operation, and 2) nearly 10% of records are missing with
current parser.
I am trying to improve biosql a bit, with Hilmar's guidance, to adapt
InterPro record ,and will complete the InterPro-BioSQL parser after
that. The work is going to finish later and you will be noticed in this
list.
A bit further, my point is, for the huge biological file-based
database, it is wise to load the complete set into RDBMS, rather than
parsing them in memory, for the sake of the speed for the complicated
queries, and the reliability of your script (due to the out-of-memory).
Juguang
On Tuesday, January 6, 2004, at 07:58 am, Holland, Richard wrote:
> Hi all. A long one this but I hope it's worth the read.
>
> I found a possible bug in Bio/OntologyIO/InterProParser.pm.
More information about the Bioperl-l
mailing list