[Bioperl-l] Bio::Index::Blast
Jason Eric Stajich
jason@cgt.mc.duke.edu
Wed, 5 Sep 2001 20:54:19 -0400 (EDT)
I added Bio::Index::Blast which is just a subclass of James Gilbert's
excellent Bio::Index::Abstract module. It allows one to index 1 to many
blast reports in 1 to many files indexed by query sequence name. One can
plug in their own id_parser method if one is interested in querying on
multiple sequence names - ie splitting up
ref|NT_100011|gb|101899282|AC011021 into NT_100011, 101899282, AC011021
rather than the entire sequence name.
This is useful for someone who has a single file of concatenated blast
reports and wants to pull out ones for a specific request e.g. providing a
summary webpage of blast results and then retrieving the detailed
information for a single report. This would, in theory, allow one to just
concatenate blast results into one file rather than having a complicated
naming system and 500k files lying around.
The index files are stored using Berkeley DB, but can be used with SDBM,
GDBM, and others - detailed information in the Bio::Index::Abstract POD.
Hopefully I've not duplicated anything that already exists....
Feel free to jump in and optimize, fix, suggest new ideas.
-jason
--
Jason Stajich
Duke University
jason@cgt.mc.duke.edu