[Biojava-dev] blast parsing continued

Doug Rusch drusch@tcag.org
Wed, 13 Nov 2002 13:47:06 -0500


Hi All,

This is somewhat of a continuation of the thread about blast parsing (it was in a thread labelled no subject in October - unfortunately just read it because I was out of town when those messages were posted).

I think the use of sequenceDBs was a better approach than using just queryID, databaseID, and subjectID. Minimally, if you look at blast output, there are 3 valuable attributes of a sequence. The id, the definition line, and the length of the sequence. The problem comes that there is no such thing as a sequence-less Sequence object. I tested and implemented an approach that makes a VirtualSequence object that is built with an SymbolList.EMPTY and has an overridden getLength method that. This allows the parser to keep all the valuable information you might have about a sequence you see in a blast output while allowing you to use all the functionality of the sequenceDB classes.

What are everyone elses opinions on this?

Doug Rusch,
TCAG.org