[Bioperl-l] BLAST and BPlite problems
Ian Korf
ikorf@sapiens.wustl.edu
Fri, 2 Mar 2001 21:50:51 -0600 (CST)
On the important problem that the statistics are at the end of the BLAST
report but BPlite reads as a stream:
I think the BPpslite strategy of writing a temp file and doing 2 parses is
a good one. It gets you all the functionality you need even if it feels
like a bit of a hack. I have 2 cleaner, but not necessarily better
solutions. (1) The BLAST developers put a copy of the statistics at the
top of the file in a slightly different format to avoid confusing existing
parsers (I obviously have no control over this - but I did cc Warren
Gish). (2) Rewrite BPlite to use files instead of pipes.
When I designed BPlite, I specifically wanted something that worked well
in pipes. But because of this, it has always been a serious limitation
that the statistics aren't part of the report object. In retrospect, I'm
not sure how important it is to abstract the input stream, and if I did it
over, I would use a file instead of a pipe. Myself, I almost never pipe
BLAST directly to BPlite. Most of the time I like to keep the report
around in case there are errors or to see the data in its native format.
Disks are cheap, but time isn't, and you can save a lot of time by keeping
records as you go. There are also performance reasons to write the file. I
have found that it takes less time to write the report to disk and then
parse it than to parse it on the fly. This may be because there is less
context switching. If you have some kind of RAM disk (eg. /tmp on solaris
or ramfs in linux 2.4), you may get the best performance. But I don't
think the performance issue is nearly as important as the functionality.
Let me take a poll.
1) Do you find the pipe useful or is a file just as good?
2) If you had to choose, would you rather have statistics or a pipe?
3) Would you like statistics at the top of the BLAST report? (again, out
of my control)
4) How about statistics only if you are parsing a file and not a stream?
(I would just check if it was a file or a GLOB and act accordingly -
parseStats() would only be active if a file)
5) How annoying is it that there are two versions of BPlite?
6) Would you buy a book called "BLAST in a Nutshell" or "BLAST: Theory and
Practice"? (I'm trying to convice a certain someone to write such a book
with me)
-Ian