[Bioperl-l] Bio::Seq -> Solr (Lucene) ?
Jay Hannah
jay at jays.net
Wed Aug 29 22:11:55 UTC 2007
Please slap me if I'm hysterical.
I'm seeking a broad bioinformatics search engine platform. I want to
take gobs of data in gobs of formats and allow people to search it on
the web.
- Entrez is awesome. Unfortunately I don't see anything in the NCBI
toolkit that helps me run my own version of it. Even a tiny one. After
an initial "check out our toolkit" response from NCBI I don't seem to be
getting anywhere. Maybe I'm not communicating enough or well enough.
- EB-eye Search is slick. I don't see any developer kit or source code
of any kind and I've gotten no response to my emails to them.
- LuceGene is very cool. But it looks like no one has touched it in 2.5
years and I've gotten no response from their contact email address. I'm
especially intrigued by their
src/LuceGene/src/org/eugenes/index/LuceneReadseqIndexer.java
which seems to use the rather popular(?) Java Readseq to populate Lucene
with source data in all sorts of different formats.
I don't know Java.
- Solr is really neat. It's easy to install and gives a simple/powerful
XML API to populate a Lucene index.
... so ...
I'm thinking BioPerl knows how to parse lots of formats into a Bio::Seq.
I'm thinking I could write Perl which would take a Bio::Seq object and
convert it to an XML file which Solr would happily inject into Lucene
for me.
If I could do that I'm thinking that any of the many formats that
Bio::SeqIO can slurp could magically be sent into a Lucene index for
searching.
I'm thinking that would be really cool and I'm going to write it.
Now's your chance to slap me.
Since I haven't started yet, what would I call this thing?
Bio::SeqIO::Solr? (and I wouldn't implement the I part?)
Thanks,
Jay Hannah
http://clab.ist.unomaha.edu/CLAB/index.php/User:Jhannah
More notes:
http://clab.ist.unomaha.edu/CLAB/index.php/RT11
More information about the Bioperl-l
mailing list