[Bioperl-l] postgres 8.3 - load_seqdatabase.pl / swissprot

Hilmar Lapp hlapp at gmx.net
Sun Mar 23 19:22:46 UTC 2008


On Mar 23, 2008, at 2:16 PM, Erik wrote:
> On Sun, March 23, 2008 02:40, Hilmar Lapp wrote:
>> But maybe load_seqdatabase.pl or even BioSQL or BioPerl
>> aren't suitable for your use-case?
>
> well, that may turn out to be the case, but I'm not quite
> deterred yet.
>
> I am in a situation like many others, I think: microarray,
> mass spec, and chipseq (Solexa) data all need
> annotation,and while it is easy to retrieve some useful
> records from public data sources (entrez, ensembl &
> biomart, etc.), it is not so easy to have such high
> atomicity in the locally stored annotation data that
> fine-grained filtering and sorting on a sql level becomes
> possible.  I hope the bioperl parsers, together with the
> biosql schema, will give SQL access to all or most data
> bits.

If you mean annotation by data bits then yes, it should be fairly  
normalized (possibly more normalized than you want, in fact).

Also, using BioSQL as the sequence and sequence annotation model add- 
on to some other database holding your lab data is what many others  
have used it for too.

>
> And I understand GBrowse can run on top of BioSQL/Pg too,
> albeit somewhat preliminary; this is another usage I will
> need.

It can, though keep in mind that that's not the use-case it (BioSQL)  
was built for. If you need to have rapid access to genome intervals  
with 10s of thousands of features and their annotation, you'll have  
start thinking about a more de-normalized data store to run this off  
of, such as populating a native GBrowse GFF store.

>
> btw, should not all those references to postgres 7.3 be
> upgraded to something newer, like 8.2.7 (maybe not yet 8.3
> heh) ?  7.3 is not supported anymore by the pg project.

Oops, indeed. Where are they?

>
> Sprot loaded in 20 hours. Only 170 were rejected - not too
> bad.

That's great. Would be nice if you can provide some rough summary as  
to why they were rejected (if that's obvious), such as taxon errors,  
or other errors.

	-hilmar
-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================






More information about the Bioperl-l mailing list