[Bioperl-l] Re: [BioSQL-l] Proposal on Bio::DB::BioSQL::MultiDB

Wed May 28 01:29:57 EDT 2003

I noticed this. Sounds cool. I'd be interested to hear under which 
circumstances and under which configuration biosql/bioperl-db with 
everything in a single database hit its limits for you.

I was wondering whether to add transaction methods (commit, rollback) 
to the adaptor factory (DBAdaptorI, BioSQL::DBAdaptor, and now 
BioSQL::MultiDB). So far I was assuming that there is only one 
connection, and if you want to commit it doesn't really matter for 
which adaptor you call commit(). That's not necessarily a generically 
valid assumption though, and in MultiDB for instance it's wrong. As an 
example, if I loaded 1 swissprot sequence and 1 embl sequence in a 
single transaction and then committed once, only one of two 
transactions (1 to each server) will get closed and the other stays 
open. I'd have to know that I have to commit both sequences 
individually.

The cleaner approach than adding commit/rollback to DBAdaptorI would 
probably be to encapsulate the transaction in a class, and then you'd 
ask the adaptor factory for the transaction. The transaction would then 
know which connections are 'dirty'.

Nothing urgent, but at some point we'll want to think about this.

	-hilmar

On Monday, May 19, 2003, at 12:16  AM, Juguang Xiao wrote:

> Hi guys,
>
> I wrote the module to solve the actual problem happened in our 
> project. I hope that is general enough to put here.
>
> Comments are most welcomed. Thanks
>
> Juguang
> ----------
> # Adaptor for Multiple BioSQL databases.
> # By Juguang Xiao <juguang at tll.org.sg>
>
> =head1 NAME
>
>     Bio::DB::BioSQL::MultiDB
>
> =head1 DESCRIPTION
>
> The scalability issue will arise, when multiple huge bio databases are 
> loaded
> in a single database in RDBMS, due to the scalability of the RDBMS. So 
> one
> solution to solve it is simply to distribute them into multiple 
> physical
> database, while a user expects to manage them by one logic adaptor.
>
> So here you go, MultiDB aims at such issue to solve. The way to apply 
> that is
> pretty simple. You, first, load data from different biodatabase, such 
> as
> swissprot or embl, into physical RDBMS databases; then create a db 
> adaptor
> for each simple physical biosql db; finally register these adaptors 
> into
> MultiDB and use it as that was a normal dbadaptor.
>
> =head1 USAGE
>
> use Bio::DB::BioSQL::MultiDB;
>
> # create the common biosql db adaptors
> my $swissprot_db;  # Physical databases may be located on different 
> servers
> my $embl_db;       # or accessible by different users.
>
> # register them by bio-database
> my $multiDB = Bio::DB::BioSQL::MultiDB->new(
>     'swissprot' => $swissprot_db,
>     'embl' => $embl_db
> );
>
> # Each time before you want to create a persistent object for Bio::Seq,
> # assign the 'namescape' sub of seq object first, as the biodatabase 
> name.
> my $seq;    # for either store or fetch.
> $seq->namespace('swissprot');
>
> # OR you need to assign the default namespace for multiDB
> $multiDB->namespace('swissport');
>
> my $pseq = $multiDB->create_persistent($seq);
> $pseq->store;
>
>
> # If you want to fetch a seq, then you have to specify namespace for 
> multiDB first
> $multiDB->namespace('swissport');
> $pseq = $multiDB->get_object_adaptor->find_by_unique_key($seq);
>
> =cut
>
> ------------ATGCCGAGCTTNNNNCT--------------
> Juguang Xiao
> Temasek Life Sciences Laboratory, National University of Singapore
> 1 Research Link,  Singapore 117604
> juguang at tll.org.sg
>
> _______________________________________________
> BioSQL-l mailing list
> BioSQL-l at open-bio.org
> http://open-bio.org/mailman/listinfo/biosql-l
>
-- 
-------------------------------------------------------------
Hilmar Lapp                            email: lapp at gnf.org
GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
-------------------------------------------------------------