[BioSQL-l] Genbank loading time

Peter biopython at maubp.freeserve.co.uk
Wed Jan 28 18:18:03 UTC 2009


>> You could re-invent the wheel, and write yet another
>> GenBank/EMBL/Swiss parser in standalone perl for use within
>> load_seqdatabase.pl but I really don't see any point to this.  Reusing
>> the BioPerl parser seems most sensible, especially given that
>> bioperl-db is an extension to bioperl in the first place - and the
>> BioPerl parsers already exist and are well tested.
>>
>> Peter
>
> My point is, instead of first mapping record data to a specific object/class
> then mapping the object data to the database, bypass the object completely
> and generically map relevant data directly in the database according to the
> BioSQL schema.
>
> If anything this may force some consistency between the various Bio*
> languages.
>
> chris

Ah - so rather than using BioPerl/Biopython/BioJava to import your
sequence files into a BioSQL database, you'd like BioSQL to come with
its own script that does the job?  It would "solve" any
inconsistencies for getting files of data into the database if this
where the only sanctioned way to add records to the database.
However, there are a number of downsides - in addition to the
considerable extra effort needed to write and support another set of
parsers just for BioSQL (without reusing BioPerl/Biopython/BioJava).

What about BioPerl/Biopython/BioJava users who have sequence-record
objects in memory they want to record in the database?  These could
have been loaded from GenBank files originally and then manipulated
(e.g. adding additional crude annotation from running BLAST).  How
would they get them into the database - write them to a GenBank file
and then invoke the project neutral BioSQL provided script?

I think each project needs their own ORM bindings for both loading
data into and from the database.  Improving any inconsistencies in how
each ends up storing sequence files (e.g. GenBank files) can be worked
on gradually.

[Perhaps I have read more into your comment than you intended - if I
have got the wrong end of the stick, please clarify - thanks]

Still, a project neutral BioSQL bundled script (not depending on any
of BioPerl/Biopython/BioJava) for importing a GenBank file into a
database could serve as a "reference implementation" (the role I
currently assign to BioPerl's load_seqdatabase.pl).  And if this
proves faster than load_seqdatabase.pl that's a nice bonus.

Peter



More information about the BioSQL-l mailing list