[Bioperl-l] Bio::Seq -> Solr (Lucene) ?
cjfields at uiuc.edu
Thu Aug 30 02:01:45 UTC 2007
On Aug 29, 2007, at 5:11 PM, Jay Hannah wrote:
> Please slap me if I'm hysterical.
> I'm seeking a broad bioinformatics search engine platform. I want to
> take gobs of data in gobs of formats and allow people to search it on
> the web.
> - Entrez is awesome. Unfortunately I don't see anything in the NCBI
> toolkit that helps me run my own version of it. Even a tiny one. After
> an initial "check out our toolkit" response from NCBI I don't seem
> to be
> getting anywhere. Maybe I'm not communicating enough or well enough.
No. I have had non-responses before from NCBI; they may just be too
busy. Warnock probably applies.
> - EB-eye Search is slick. I don't see any developer kit or source code
> of any kind and I've gotten no response to my emails to them.
Not sure of this one personally.
> - LuceGene is very cool.
> I don't know Java.
...but you could write a (perl) wrapper around it. You can try
contacting Don Gilbert about it, though I think he's been trying out
> - Solr is really neat. It's easy to install and gives a simple/
> XML API to populate a Lucene index.
> ... so ...
> I'm thinking BioPerl knows how to parse lots of formats into a
> I'm thinking that would be really cool and I'm going to write it.
> Now's your chance to slap me.
> Since I haven't started yet, what would I call this thing?
> Bio::SeqIO::Solr? (and I wouldn't implement the I part?)
> Jay Hannah
> More notes:
The way I would go about it is use an established XML schema as a
starting point and implement a writer (if bioperl doesn't already
support it). It's better than reinventing (a constantly reinvented)
wheel and starting up a brand-new schema of your own. INSDSeq
(http://www.insdc.org/page.php?page=xmlstatus) is one I've been
wanting to add for a while but haven't had time to work on; there are
several other examples. Note that a few of the currently supported
ones in bioperl, such as bsml and game, have had very little to no
development over the years in favor of newer (better?) XML flavors,
so it likely isn't worth working with those.
More information about the Bioperl-l