[Bioperl-l] Loading SwissProt Data into Oracle

Ewan Birney birney at ebi.ac.uk
Sat Apr 19 20:07:01 EDT 2003



On Fri, 18 Apr 2003, Jason Stajich wrote:

> In the biosql CVS repository you want the script
> bioperl-db/scripts/biosql/load_seqdatabase.pl
>
> see http://cvs.open-bio.org/ for more info on our repository.
>
> In the nearish future I expect Hilmar/Aaron/Chris will make a bioperl-db
> release of using the current biosql schema.

Yup - in biosql-schema (check out the cvs link above) there are a series
of docs in the "docs" subdirectory which step through the load system in
somewhat mind-numbing detail.


If you notice the details of the Oracle load is "left as an exercise to
the reader". If you could update the document about how to load into
oracle in that real sort of "cookbook" way people like, then that would be
great.




One of things to realise is that swissprot is by far the most semantic
rich and also challengingly ... baroque .... flat file formats. BioSQL is
tied to the bioperl object model.... there *are* one or two of the
swissprot semantics which are not represented in the bioperl object model


(the insanity of multi-species entries is one such lovely, which BTW
swissprot has promised not to make any *more* multi species entries and
will get around to sorting out the existing ones at some point...).



so, if you have someone looking to get every last iota of detail out of
swissprot you either:

  (a) have to write the schema yourself and your own loader (not
recommended)

OR

  (b) help finish off these details in (i) the bioperl object model (the
details would get added to Bio::Seq::RichSeq) (ii) the parser in
Bio::SeqIO::swiss and (iii) the BioSQL schema data model (it maybe that
many of the details can be stored inside the ontology tables, but ... I
wonder) and bioperl-db bindings



However, this is just a heads-up on the challenges invovled in parsing the
whole of swissprot. by and large Bioperl does a pretty good job.



>
> -jason
>
> On Fri, 18 Apr 2003, Neil Evans wrote:
>
> > Hello,
> >
> > I'm interested in loading the raw swissprot data into an Oracle database.  This, of course, involves:
> > 1. The parsing of the data to produce some DB-friendly format (possibly SQL)
> > 2. The design/loading of DB schema to contain the data
> > 3. The actual loading of the data
> >
> > I've done some reading on BioSQL and I think that may be the way to go.  I figure there must be some PERL script out there which can parse swissprot, and possibly even perform the SQL generation.
> >
> > Any pointers?
> >
> > thanks!
> > -Neil.
> >
> > --
> > ===============================================
> > Neil.Evans at oracle.com
> > Senior Software Developer
> > Oracle Web Services UDDI Registry
> > Oracle Corporation
> >
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at bioperl.org
> > http://bioperl.org/mailman/listinfo/bioperl-l
> >
>
> --
> Jason Stajich
> Duke University
> jason at cgt.mc.duke.edu
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at bioperl.org
> http://bioperl.org/mailman/listinfo/bioperl-l
>



More information about the Bioperl-l mailing list