[Bioperl-l] Bio::Index::Blast

Jason Eric Stajich jason@cgt.mc.duke.edu
Wed, 5 Sep 2001 20:54:19 -0400 (EDT)


I added Bio::Index::Blast which is just a subclass of James Gilbert's
excellent Bio::Index::Abstract module.  It allows one to index 1 to many
blast reports in 1 to many files indexed by query sequence name.  One can
plug in their own id_parser method if one is interested in querying on
multiple sequence names - ie splitting up
ref|NT_100011|gb|101899282|AC011021 into NT_100011, 101899282, AC011021
rather than the entire sequence name.

This is useful for someone who has a single file of concatenated blast
reports and wants to pull out ones for a specific request e.g. providing a
summary webpage of blast results and then retrieving the detailed
information for a single report.  This would, in theory, allow one to just
concatenate blast results into one file rather than having a complicated
naming system and 500k files lying around.

The index files are stored using Berkeley DB, but can be used with SDBM,
GDBM, and others - detailed information in the Bio::Index::Abstract POD.

Hopefully I've not duplicated anything that already exists....
Feel free to jump in and optimize, fix, suggest new ideas.

-jason

-- 
Jason Stajich
Duke University
jason@cgt.mc.duke.edu