[Bioperl-l] Bioperl hackathon?

Andrew Dalke dalke@dalkescientific.com
Fri, 3 Aug 2001 00:38:16 +0100

I would be interested in such a hack-a-thon.  There are four
proposals I know of for where to host it: AZ (Arizona), ZA (Cape
Town, South Africa), RTP (Research Triangle Park, North Carolina)
and Boston.  Of these I prefer US locations because they are
cheap for me.  (Tucson is $110 away from Santa Fe, RTP is $390,
Boston is $420, and Cape Town is $1600.)  BTW, London to Tuscon
round-trip is $525, according to Travelocity.

There are several possibilities for such a thon.  I would
prefer a strongly multiproject related one - that is, content
people only (coders, documenters, QA) from the different projects
for the goals of working on
  - data models (eg, the taxonomy class used in bioperl cannot
       handle multiple common names)
  - integration (command-line, CORBA, XML, DAS, SOAP, SQL)
  - standardization (same names for the same terms)
  - database and format support (did you think I would omit this? :)
  - cross-pollination of ideas (eg, get bioperlers to use the
        biopython BNF for parsing sequence locations; and our prosite
        pattern to regular expression converter)
  - multi-project test cases (eg, here's a bunch of BLAST files
      and here's what you should get as output; here's some prosite
      patterns; here's some SWISS-PROT records; here's some ...)

I would also like to work on use cases and persona development,
but I suspect that won't be all that popular.  The first might
help with writing tutorials, but the second is only really useful
if there's a product with a reasonably well defined end-user.

Another possibility is a fast introduction to software development
methodologies, like XP and participatory design.  ORA should know
the right people that even know how to tweak these approaches for
open source/distributed projects.

What would this need?  In my dreams:

  a good server machine, prepopulated with mySQL, the languages,
the needed libraries, EMBOSS, BLAST, other free programs, data
sets (why not all of GenBank, SWISS-PROT, PDB, the NCBI FASTA
files, etc.? - it's only a could hundred MBs :)  This can all
be done before hand as otherwise setting things up wastes hours.
  100Mbit local network, with wireless
  network access to the outside world
  workgroup quality b&w laser printer

Physical Environment:
  one large room where everyone can fit, and with projectors
(video and overhead) and whiteboards
  a couple small rooms for subgroups to work together, talk
loudly, fight holy wars, play music at hacking level.  Must
also have whiteboard.
  *real*tables*and*chairs* where I can work on my laptop at
the right height and posture and not get sore after a couple
of hours of working.
  I don't think there should be more than 15 people, and must
not have more than about 20.  Why?  More than that people (like
me) become inhibited - I feel like I'm wasting others' time.
Ewan's list has about a dozen people, so I can't see many more.
(There's an exception to this, which is there can be more people
if there is a really good moderator and well defined goals for
when everyone meets together.)

  Three days.  One is too short, unless there is really good
organization beforehand.  It takes a while to get up to speed,
see who you're going to work with, figure out what to do.  Two
days could work, but that's enough time to work on only one
project (the Python bioperl-db integration work took about 9
straight hours).  So 1/2 day for talks and discussions, 1/2
day to get up to speed coding, 2 days for various bits of
coding.  But beyond 3 days and the intense, continuous work
can get too much.

No distractions:
  This is the biggest problem with doing this thon at the same
time as the O'Reilly conference.  How many people would like
to attend the conference meetings?  But multitasking when coding
just doesn't work that well, and on top of that there will be
people who want to code together but want to attend different
  I wanted to do some of the tasks I mentioned above when I
visited EBI.  I did some of it, but most people were involved
with getting EnsEMBL out the door, leaving little time to talk
with people.  I wanted to to more at BOSC, but there were
meetings scheduled *all* *day* *long* (including biopathways)
which made it hard to get together for more than a few minutes.
Worse even with ISMB.
  I've been to several programming contests which took place
during a conference.  Almost invariably no one went to any of
the conference talks.  (It was a well organized competition;
they deliberately closed the lab for an hour so we could attend
the keynote address :)

Boot Camp and the same time?  I don't think so.
  Another possibility is to have a boot camp training session
for new users.  I think this is important to have, because it's
a way to do large scale user testing, and see where they have
problems (either where the code or the docs need to be fixed)
and learn more about their needs.  However, having a boot camp
is also distracting since the content developers should be
watching (and doing the proverbial sitting on their hands and
not saying a thing - only observing people's problems and how
they try to resolve them).
  That isn't to say that boot camps, etc. are bad.  However,
based on what I saw of BOSC/ISMB I think that's a much more
appropriate place for a boot camp, since they don't need as
much continuous work as content development does.

That said, here's the most likely possibility for that to
occur (in my own, naive world :)

  Have the code-a-thon is part of the ORA conference in AZ.
Have it start before the main conference, so on Saturday and
Sunday.  We come into the thon with several well-defined ideas
of what we might work on (not that those need to occur; must
be flexible).  The server and network stays up and available
through the whole of the conference.

  I don't know enough about the costs for doing this.  Even
the likely conference fee will be a lot (the Open Source
Convention was about $900/3 days, while ISMB was about $830/4
days).  I'm a small company and can't afford much as a business
expense, unless it produces usable code, contacts or contracts.
I honestly don't know who will go to this conference, but as
my target customers are computational biologists and chemists
(not covered under the "two overarching themes") I suspect
there will be few leads, so I can only really justify this
conference if there's some good coding and chance to learn/use
more about large-scale systems for this field.

  Uh-oh.  Just checked - the prices for a room at the Westin La
Paloma is $350/night(!).  Five nights is nearly $2,000.  For
that I could fly two people from England and put them up at the
hotel near my place here in Santa Fe, and buy them lift tickets
for a day and include meals and margaritas.

  Is that really what the prices will be like for the conference?
(about $2,500 for conference and room fees)?  If so, it's very
unlikely that I can go, and I would ask the thon be in RTP or
Boston.  And if in Boston then emphatically *not* in January.