[Bioperl-l] Blast Output and frac_aligned_query

Jason Stajich jason at cgt.duhs.duke.edu
Mon Jul 19 09:33:35 EDT 2004


On Mon, 19 Jul 2004, James Wasmuth wrote:

> First apologies if this has been debated before, didn;t see it in the
> archive and been away for a while, so unlcear on current state of affairs.
>
> I have a bl2seq output (below) and when I extract its statistics, I am
> told that 156% of the query is aligned.
>
> This is probably because of multiple HSP produced as the protein appears
> highly repetitive. Would this mess up the tiling the hsps, in its
> current implementation?

I guess so.  SteveC is the tiling hsp guru so would have to see what he
thinks.

I think a lot of people out there have HSP tiling code - it would be nice
to be able to incorporate more solutions to this problem so that one
could try different strategies...

You might also try using WU-BLAST with -links turned on which provides
consistent groups of HSPs, we haven't (yet) incorporated interpreting the
link information as a way to tile HSPs but would be a good project for
someone to try out.  (or for someone to donate if they have already solved
this)

-jason

>
>
> cheers
> -james
>
>
> e = 2e-19
> s = 205
> b = 83.6
> aln_q = 1.56  !
> aln_h = 0.09
> id = 0.208
> cons = 0.256
> len = 332
>
>
>
> > Query= prediction
> >          (80 letters)
> >
> > >wormpep
> >           Length = 2592
> >
> >  Score = 83.6 bits (205), Expect = 2e-19
> >  Identities = 41/47 (87%), Positives = 43/47 (91%)
> >
> > Query: 1    SIRDEFSMNSAADSPMSTTGRPMVLTKAAMKAFNSTPPKKKNSSSGQ 47
> >             SIRDEFSMNSAADSPMSTTGRPMVLTKAAMKAFNSTPPKK+   + Q
> > Sbjct: 1528 SIRDEFSMNSAADSPMSTTGRPMVLTKAAMKAFNSTPPKKETDQAVQ 1574
> >
> >
> >
> >  Score = 29.3 bits (64), Expect = 0.004
> >  Identities = 13/24 (54%), Positives = 18/24 (75%)
> >
> > Query: 50  SSSGSSSDSSSXDGSTSSDDSXDD 73
> >            S S SSSDS S +GS+SS++  D+
> > Sbjct: 493 SGSDSSSDSDSEEGSSSSNEDSDE 516
> >
> >
> >
> >  Score = 26.2 bits (56), Expect = 0.036
> >  Identities = 13/33 (39%), Positives = 20/33 (60%), Gaps = 1/33 (3%)
> >
> > Query: 40  KKNSSSGQHDSSSGSSSDSSSXDGSTSSDDSXD 72
> >            ++N++SG  DSSS S S+  S   +  SD+  D
> > Sbjct: 488 QENNASGS-DSSSDSDSEEGSSSSNEDSDEQND 519
> >
> >
> >
> >  Score = 23.5 bits (49), Expect = 0.24
> >  Identities = 14/69 (20%), Positives = 31/69 (44%), Gaps = 4/69 (5%)
> >
> > Query: 9   NSAADSPMSTTGRPMV----LTKAAMKAFNSTPPKKKNSSSGQHDSSSGSSSDSSSXDGS 64
> >            +  + SP S+  R  +     T+++++   +   ++ N+S     S S S   SSS +
> > Sbjct: 454
> > DQGSSSPSSSRDRQNLHDPLQTRSSVEHHTNQEDQENNASGSDSSSDSDSEEGSSSSNED 513
> >
> > Query: 65  TSSDDSXDD 73
> >            +   +  D+
> > Sbjct: 514 SDEQNDVDE 522
> >
> >
> >
> >  Score = 21.9 bits (45), Expect = 0.68
> >  Identities = 10/29 (34%), Positives = 16/29 (55%)
> >
> > Query: 40  KKNSSSGQHDSSSGSSSDSSSXDGSTSSD 68
> >            + +S + + ++  GSSS SSS D     D
> > Sbjct: 443 RSSSPTSKSENDQGSSSPSSSRDRQNLHD 471
> >
> >
> >
> >  Score = 21.2 bits (43), Expect = 1.2
> >  Identities = 13/34 (38%), Positives = 16/34 (47%)
> >
> > Query: 43   SSSGQHDSSSGSSSDSSSXDGSTSSDDSXDDXVP 76
> >             S S    ++SGS S S +   STSS  S     P
> > Sbjct: 2327 SRSSTMGNNSGSPSASGTTSPSTSSSISSGPDSP 2360
> >
> >
> >
> >  Score = 21.2 bits (43), Expect = 1.2
> >  Identities = 12/47 (25%), Positives = 17/47 (36%)
> >
> > Query: 27   KAAMKAFNSTPPKKKNSSSGQHDSSSGSSSDSSSXDGSTSSDDSXDD 73
> >             K   KA      KKK+      D S   S+D    D   S+ +   +
> > Sbjct: 1144 KVRKKAEKEKLKKKKHRKGDSSDESDSDSNDELDLDVRKSTKEMTQE 1190
> >
> >
> >
> >  Score = 20.0 bits (40), Expect = 2.6
> >  Identities = 11/35 (31%), Positives = 18/35 (51%), Gaps = 1/35 (2%)
> >
> > Query: 42  NSSSGQHDSSSGSSS-DSSSXDGSTSSDDSXDDXV 75
> >            + SS   DS  GSSS +  S + +   ++  +D V
> > Sbjct: 495 SDSSSDSDSEEGSSSSNEDSDEQNDVDEEDDEDVV 529
> >
> >
> >
> >  Score = 18.5 bits (36), Expect = 7.6
> >  Identities = 7/14 (50%), Positives = 9/14 (64%)
> >
> > Query: 49   DSSSGSSSDSSSXD 62
> >             +SS+G  SDS   D
> > Sbjct: 1252 NSSNGEESDSEKAD 1265
> >
> >
> > Lambda     K      H
> >    0.294    0.109    0.279
> >
> > Gapped
> > Lambda     K      H
> >    0.267   0.0410    0.140
> >
> >
> > Matrix: BLOSUM62
> > Gap Penalties: Existence: 11, Extension: 1
> > Number of Hits to DB: 2307
> > Number of Sequences: 0
> > Number of extensions: 39
> > Number of successful extensions: 11
> > Number of sequences better than 10.0: 1
> > Number of HSP's better than 10.0 without gapping: 1
> > Number of HSP's successfully gapped in prelim test: 0
> > Number of HSP's that attempted gapping in prelim test: 0
> > Number of HSP's gapped (non-prelim): 10
> > length of query: 80
> > length of database: 115,000
> > effective HSP length: 56
> > effective length of query: 24
> > effective length of database: 114,944
> > effective search space:  2758656
> > effective search space used:  2758656
> > T: 11
> > A: 40
> > X1: 17 ( 7.2 bits)
> > X2: 38 (14.6 bits)
> > X3: 64 (24.7 bits)
> > S1: 35 (18.0 bits)
> > S2: 35 (18.1 bits)
>
>
>

--
Jason Stajich
Duke University
jason at cgt.mc.duke.edu


More information about the Bioperl-l mailing list