Script for incremental update of databases

jrvalverde at cnb.uam.es jrvalverde at cnb.uam.es
Tue Mar 12 14:36:01 UTC 2002


"Dr J.C. Ison" <jison at hgmp.mrc.ac.uk> wrote:
> Johanne  -
> 
> If the database was made up lots of files then Mirror
> might help:
> http://www.freecode.com/projects/mirror1/
> 

Not really that much simple, see

> > We will start using Emboss. We want certain databases to reside locally
> > (genbank and swissprot for example). Are they any scripts already
> > available that would update the database each day for new sequences
                                                          ^^^^^^^^^^^^^
> > only.
    ^^^^^

So the actual answer is that it depends on the bandwidth you have. If
you have enough to transfer a full cumulative update by the end of a
release period in less than one night, then mirror or wget might be
OK. Simply get either "cumulative.dat" or "cum_*.dat".

Otherwise the things get complicated. The reason is that, as far as I know,
you can't assume the cumulative file only changes by addition (please,
correct me if I'm wrong). If it were only appended data every day, then
it would be trivial to use wget to append the appended new data to your
local copy. But sometimes entries get changed, deleted or merged.

There are daily updates you can get, but these have the same problem: if
an entry has been modified, deleted, merged... and you simply append a
daily update to your local copy, then you'll end up with duplicated
entries.

The solution: get the daily/weekly update and list files, then use SynCron 
to merge the new data with the old version you have on your disk. 

SynCron is available at EBI and fine mirror sites:

	ftp://ftp.ebi.ac.uk/pub/software/unix/SynCron/

But, of course, it is easier if instead of only downloading the daily
changes you got the full cumulative changes files, where you have ALL
changes occurred since last release already applied in one (set of)
file(s).

				j




More information about the EMBOSS mailing list