[Biojava-l] BlastLikeSearchBuilder and queryIDs

Keith James kdj at sanger.ac.uk
Tue May 13 10:37:38 EDT 2003


>>>>> "Frank" == Frank Vernaillen <fr_ve at hotmail.com> writes:

    Frank> Hello!  I'm planning to use BioJava for parsing some
    Frank> relatively large (multi-megabyte, flat-file) Blast result
    Frank> files. My idea was to parse the data somewhat along the
    Frank> lines of http://bioconf.otago.ac.nz/biojava/BlastParser.htm
    Frank> and
    Frank> http://bioconf.otago.ac.nz/biojava/ExtractSearchInformation.htm.

    Frank> This way I end up with a Vector of
    Frank> SeqSimilaritySearchResults. The SeqSimilaritySearchResult
    Frank> interface offers a getQuerySequence() method, but it
    Frank> returns a SymbolList, not a Sequence. Now this is a
    Frank> problem, because I can't seem to obtain the *queryID* of
    Frank> the sequence anymore, only the sequence symbols
    Frank> themselves. Was this a deliberate design choice?

To be honest, I don't know because the original interface design
predates my involvement. I've generally been (over) cautious about
changing these interfaces - however, I can see this is an
issue. Perhaps I can squeeze in this change before the release?
i.e. change getQuerySequence to actually return a Sequence (as the
name suggests) rather than a SymbolList. (ASAP - this evening?)

Recently the org.biojava.bio.search interfaces have been made
Annotatable and the Annotation object associated with each Result, Hit
and SubHit used to capture all data sent by the SAX parser to the
result builder. These are all stored as String key-value pairs. I just
need to document what pairs are available for Blast. I'll check this
in with the above change if nobody objects.

Keith

-- 

- Keith James <kdj at sanger.ac.uk> bioinformatics programming support -
- Pathogen Sequencing Unit, The Wellcome Trust Sanger Institute, UK -



More information about the Biojava-l mailing list