[Bioperl-l] getting database hit sequences
Elia Stupka
elia@fugu-sg.org
Mon, 21 Oct 2002 13:46:54 +0800 (SGT)
> no way for the parser to get it. But it must somehow be possible (aren't
> other people interested in the sequences that scored hits???).
The best way to do this (and a lot of other things) is to keep a local
database of the sequences. You can use a flat file database or a MySQL
based database, they are both very easy to setup with bioperl see:
perldoc biodatabases.pod (in the root of bioperl directory)
> the GI with a regexp. But how do I get the right sequence for a specific
> GI number efficiently? Parsing the entire database for every hit is
> O(num_sequences^2) and therefore much too slow to be feasible.
You definitely need an indexed database, such as the ones provided in
bioperl. Also if you build a database from the sequences you are blasting
it will be easier to know which id to use to fetch it.
If you cannot afford the space to have a database locally, then you can
(provided you guess the GI number,etc.) fetch remotely using
Bio::DB::GenBank,etc.
Hope it helps,
Elia
********************************
* http://www.fugu-sg.org/~elia *
* tel: +65 6874 1467 *
* mobile: +65 9030 7613 *
* fax: +65 6779 1117 *
********************************