[Biojava-l] Extract accession number out of xml blast result

mark.schreiber at novartis.com mark.schreiber at novartis.com
Fri Nov 11 00:19:32 EST 2005

Another way to parse the results without using the full blown object model 
and SequenceDB is to extend SearchContentAdapter and listen for the events 
of interest. The event that gives you the query id is a callback to 
setQueryID(String id) on the adapter.

Take a look at http://www.biojava.org/docs/bj_in_anger/blastecho.htm for 
some hints.

- Mark

"Richard HOLLAND" <hollandr at gis.a-star.edu.sg>
Sent by: biojava-l-bounces at portal.open-bio.org
11/11/2005 10:15 AM

        To:     "Andreas Scheucher" <andreas.scheucher at embl.de>, <biojava-l at biojava.org>
        cc:     (bcc: Mark Schreiber/GP/Novartis)
        Subject:        RE: [Biojava-l] Extract accession number out of xml blast result

As documented at BioJava in Anger, the subject's accession can be
obtained from the SeqSimilaritySearchHit using getSubjectID(). 

By reading the API, the query's accession can be obtained from
SeqSimilaritySearchResult using getQuerySequence().getName().

However... unforunately, the query accession method above does not work
if you follow the BioJava in Anger example code!

BlastLikeSearchBuilder requires a SequenceDB and a
SequenceDBInstallation. The former should contain all sequences used in
the query, and the latter should be able to provide SequenceDB instances
corresponding to the databases used in the blast. For instance, if you
blasted query "A12345" vs. database "nr", then the SequenceDB instance
should return a meaningful value for getSequence("A12345"), and the
SequenceDBInstallation instance should return a meaningful value for

The example at BioJava in Anger uses a DummySequenceDB and
DummySequenceDBInstallation to pass to the BlastLikeSearchBuilder. Both
these instances generate the exact same response no matter what values
you pass to getSequence() and getSequenceDB() - they return a Sequence
or SequenceDB with the name of "dummy".

If you are really interested in the actual query accession, you would
need to provide your own SequenceDB which returned appropriately named
sequences. If your queries all come from an existing SequenceDB object,
you can just pass this straight in. Likewise, if you are really
interested in the target database name, you can construct or use some
other SequenceDBInstallation to provide the appropriate SequenceDB

BUT... you can get round all this object overkill by knowing a few
things about your query data before trying to parse it. First, when you
run BLAST on multiple query sequences in a single input file, the report
generated will contain the query sequences in the same order as the
input file. Second, the SeqSimilaritySearchResult objects are returned
in the same order as the results appear in the BLAST report, and there
will be one SeqSimilaritySearchResult object per query sequence. So, if
you have a list of your query sequence accessions in the order they
appear in the input file to BLAST, you can then maintain a counter which
increments each time you obtain the next SeqSimilaritySearchResult, and
that counter will provide a direct pointer into your list to tell you
which query accession you are currently working with. Likewise, you
should know already what blast database you blasted against, so you
shouldn't really need to get this information from the results.


Richard Holland
Bioinformatics Specialist
GIS extension 8199
This email is confidential and may be privileged. If you are not the
intended recipient, please delete it and notify us immediately. Please
do not copy or use it for any purpose, or disclose its content to any
other person. Thank you.

> -----Original Message-----
> From: biojava-l-bounces at portal.open-bio.org 
> [mailto:biojava-l-bounces at portal.open-bio.org] On Behalf Of 
> Andreas Scheucher
> Sent: Thursday, November 10, 2005 6:49 PM
> To: biojava-l at biojava.org
> Subject: [Biojava-l] Extract accession number out of xml blast result
> Hi,
> I'am parsing a blast result file for an multi fasta search 
> with biojava.
> Now I'm wondering, whether there really is no possibility to get the 
> accession number out of an blast hit. The xml tag with the 
> information 
> is there but where ist the belonging function?
> Thanks for your effort.
> Regards,
> Andreas
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l at biojava.org
> http://biojava.org/mailman/listinfo/biojava-l

Biojava-l mailing list  -  Biojava-l at biojava.org

More information about the Biojava-l mailing list