[Biojava-l] BLAST Parser for extracting all BLAST data?

Y D Sun Yudong.Sun at newcastle.ac.uk
Tue Jun 28 06:13:25 EDT 2005


Hi,

With the example, I can extract all information I require except the
length of query sequence. Is there any "hidden" method that can report
the query length in parenthesis as (138 letters) in the sample output
below?

BTW, the addSubHitProperty() method doesn't report the Gaps data.
Fortunately, I don't need it at the moment.

Thanks,

George

>-----Original Message-----
>From: mark.schreiber at novartis.com [mailto:mark.schreiber at novartis.com] 
>Sent: 27 June 2005 03:25
>To: Y D Sun
>Subject: Re: [Biojava-l] BLAST Parser for extracting all BLAST data?
>
>Hello -
>
>Take a look at the Blast examples in biojava in anger (follow 
>the cookbook link from the biojava.org page).
>
>In particular look at
>http://www.biojava.org/docs/bj_in_anger/blastecho.htm
>
>The example program will tell you which methods are being 
>called for what information and will give you some clues as to 
>where everything ends up.
>
>- Mark
>
>
>
>
>
>"Y D Sun" <Yudong.Sun at newcastle.ac.uk>
>Sent by: biojava-l-bounces at portal.open-bio.org
>06/26/2005 05:42 PM
>
> 
>        To:     <biojava-l at biojava.org>
>        cc:     (bcc: Mark Schreiber/GP/Novartis)
>        Subject:        [Biojava-l] BLAST Parser for 
>extracting all BLAST data?
>
>
>Hi,
>
>I want to extract all data from BLASTP results. In the 
>following hit, for example, I need to get the lengths of query 
>and subject proteins, the identities (including all data 54, 
>124 and 43%), the positives (all data 79, 124 and 63%), and 
>the gaps (3, 124 and 2%). Can the BLASTLikeSAXParser filter 
>all these information? I can't find the methods in 
>SeqSimilaritySearchHit and SeqSimilaritySearchSubHit APIs to 
>retrieve these data. Does Biojava provide any methods for this purpose?
>
>Thanks,
>
>George
>
>
>BLASTP 2.2.5 [Nov-16-2002]
>
>Query= Prot0001
>         (138 letters)
>
>Database: /work/nys1/fasta/protein/AE000782.pro.fasta
>           2407 sequences; 662,866 total letters
>
>Searching.....done
>
>                                                               
>  Score E
>Sequences producing significant alignments:                      (bits)
>Value
>
>Prot0002                                                           100
>1e-23
>Prot0003                                                            74
>2e-15
>Prot0004                                                            43
>3e-06
>
>>Prot0002
>          Length = 138
>
> Score =  100 bits (250), Expect = 1e-23  Identities = 54/124 
>(43%), Positives = 79/124 (63%), Gaps = 3/124 (2%)
>
>Query: 18  NARTKFTDIAKTLNLTEAAIRKRIKKLEENQIIKRYSIDIDYKKLGYNMAIIGLDIDMDY
>77
>           NAR   T IAK LN+TEAA+RKRI  LE  + I  Y   I+YKK+G + ++ G+D+D D
>Sbjct: 15  NARIPKTRIAKELNVTEAAVRKRIANLERREEILGYKAIINYKKVGLSASLTGVDVDPDK
>74
>
>Query: 78  FPKIIKELEKRKEFLHIYSSAGDHDIMVIAIYK---DLEEIYNYLKNLKGVKRVCPAIII
>134
>             K+++EL+  +    ++ + GDH IM   I K   +L EI+  +  ++GVKRVCP+II
>Sbjct: 75  LWKVVEELKDLESVKSLWLTTGDHTIMAEIIAKSVQELSEIHQKIAEMEGVKRVCPSIIT
>134
>
>Query: 135 DQIK 138
>           D +K
>Sbjct: 135 DIVK 138
>
>_______________________________________________
>Biojava-l mailing list  -  Biojava-l at biojava.org 
>http://biojava.org/mailman/listinfo/biojava-l
>
>
>
>



More information about the Biojava-l mailing list