[Bioperl-l] getting database hit sequences

Elia Stupka elia@fugu-sg.org
Mon, 21 Oct 2002 13:46:54 +0800 (SGT)


> no way for the parser to get it. But it must somehow be possible (aren't
> other people interested in the sequences that scored hits???).

The best way to do this (and a lot of other things) is to keep a local
database of the sequences. You can use a flat file database or a MySQL
based database, they are both very easy to setup with bioperl see:

perldoc biodatabases.pod (in the root of bioperl directory)

> the GI with a regexp. But how do I get the right sequence for a specific 
> GI number efficiently? Parsing the entire database for every hit is 
> O(num_sequences^2) and therefore much too slow to be feasible.

You definitely need an indexed database, such as the ones provided in
bioperl. Also if you build a database from the sequences you are blasting
it will be easier to know which id to use to fetch it.

If you cannot afford the space to have a database locally, then you can
(provided you guess the GI number,etc.) fetch remotely using
Bio::DB::GenBank,etc.

Hope it helps,

Elia

********************************
* http://www.fugu-sg.org/~elia *
* tel:    +65 6874 1467        *
* mobile: +65 9030 7613        *
* fax:    +65 6779 1117        *
********************************