[Biojava-l] BLAST Parser for extracting all BLAST data?

Y D Sun Yudong.Sun at newcastle.ac.uk
Sun Jun 26 05:42:08 EDT 2005


Hi,

I want to extract all data from BLASTP results. In the following hit,
for example, I need to get the lengths of query and subject proteins,
the identities (including all data 54, 124 and 43%), the positives (all
data 79, 124 and 63%), and the gaps (3, 124 and 2%). Can the
BLASTLikeSAXParser filter all these information? I can't find the
methods in SeqSimilaritySearchHit and SeqSimilaritySearchSubHit APIs to
retrieve these data. Does Biojava provide any methods for this purpose?

Thanks,

George


BLASTP 2.2.5 [Nov-16-2002]

Query= Prot0001
         (138 letters)

Database: /work/nys1/fasta/protein/AE000782.pro.fasta
           2407 sequences; 662,866 total letters

Searching.....done

                                                                 Score
E
Sequences producing significant alignments:                      (bits)
Value

Prot0002                                                           100
1e-23
Prot0003                                                            74
2e-15
Prot0004                                                            43
3e-06

>Prot0002
          Length = 138

 Score =  100 bits (250), Expect = 1e-23
 Identities = 54/124 (43%), Positives = 79/124 (63%), Gaps = 3/124 (2%)

Query: 18  NARTKFTDIAKTLNLTEAAIRKRIKKLEENQIIKRYSIDIDYKKLGYNMAIIGLDIDMDY
77
           NAR   T IAK LN+TEAA+RKRI  LE  + I  Y   I+YKK+G + ++ G+D+D D
Sbjct: 15  NARIPKTRIAKELNVTEAAVRKRIANLERREEILGYKAIINYKKVGLSASLTGVDVDPDK
74

Query: 78  FPKIIKELEKRKEFLHIYSSAGDHDIMVIAIYK---DLEEIYNYLKNLKGVKRVCPAIII
134
             K+++EL+  +    ++ + GDH IM   I K   +L EI+  +  ++GVKRVCP+II
Sbjct: 75  LWKVVEELKDLESVKSLWLTTGDHTIMAEIIAKSVQELSEIHQKIAEMEGVKRVCPSIIT
134

Query: 135 DQIK 138
           D +K
Sbjct: 135 DIVK 138



More information about the Biojava-l mailing list