[Bioperl-l] frac_aligned_query returning results >1.

Thiago Venancio thiago.venancio at gmail.com
Sat Mar 3 12:41:39 UTC 2007


Hi all.

Sorry about this, but the bug persists. Although the number of problematic
cases is too low (3 out of 35139), they are present.

Please find attached an example buggy blast report.

The line I use to call the function is:
print $result->query_name."\t".$hit->frac_aligned_query."\n";

The warning bellow is still appearing a lot of times during processing
reports, so I think it is not due to the same bug.

------------- EXCEPTION: Bio::Root::Exception -------------
MSG: Undefined sub-sequence (821,821). Valid range = 778 - 821
STACK: Error::throw
STACK: Bio::Root::Root::throw /usr/share/perl5/Bio/Root/Root.pm:328
STACK: Bio::Search::HSP::HSPI::matches
/usr/share/perl5/Bio/Search/HSP/HSPI.pm:711
STACK: Bio::Search::SearchUtils::_adjust_contigs
/usr/share/perl5/Bio/Search/SearchUtils.pm:421
STACK: Bio::Search::SearchUtils::tile_hsps
/usr/share/perl5/Bio/Search/SearchUtils.pm:200
STACK: Bio::Search::Hit::GenericHit::frac_aligned_query
/usr/share/perl5/Bio/Search/Hit/GenericHit.pm:1145
STACK: ./geraStatGenome.pl:34
-----------------------------------------------------------

I have checked the code, but I have no idea about what is happening in this
case. the attached file produced the ">1" result and pops the exception
error, so it could be useful.

Thiago


On 3/2/07, Steve Chervitz <sac at bioperl.org> wrote:
>
> Glad you fixed the problem, Sendu.
>
> I thought this might have been due to a problem in HSPI::matches() since
> it was reporting (1507,1507) as an invalid range within (1444,1507), when it
> should be valid (the last position). So it looked like an edge condition
> bug, but I didn't confirm. So there still could be a lingering problem in
> the matches() function, or in the way the matches string is parsed from the
> report.
>
> Speaking of which, HSPI::matches() is quite BLAST-specific. It's even
> format specific, since it won't work if you are parsing in tabular blast
> reports as they lack any string of match symbols. I thought about moving the
> matches implementation in HSPI into BlastHSP.pm, but that module appears
> to not be used anymore. Not sure the way to go here.
>
> Steve
>
> On 3/2/07, Thiago Venancio < thiago.venancio at gmail.com> wrote:
>
> > Hi Sendu,
> >
> > Great to know you fixed the problem.
> > I have updated the SearchUtils and seems to be correct now.
> >
> > Best!
> >
> > Thiago
> >
> >
> > On 3/2/07, Sendu Bala <bix at sendu.me.uk> wrote:
> > >
> > > Thiago Venancio wrote:
> > > > Hi Sendu and Chris,
> > > >
> > > > Thanks for the help.
> > > > As I mentioned, I have updated my SearchUtils file from:
> > > >
> > >
> > http://code.open-bio.org/cgi-bin/viewcvs/viewcvs.cgi/bioperl-live/Bio/Search/SearchUtils.pm
> > > > <
> > >
> > http://code.open-bio.org/cgi-bin/viewcvs/viewcvs.cgi/bioperl-live/Bio/Search/SearchUtils.pm
> > > >
> > > >
> > > > I am also using the lates BioPerl version, installed from CPAN.
> > > >
> > > > Please find a buggy blast report attached.
> > > > In this case, the frac_aligned_query() outputs "1.04", but I have
> > others
> > > > with " 1.57" for example.
> > > >
> > > > Just for a quantitative aspect, I got ">1" values in only 61 /
> > 53,377.
> > >
> > > Many thanks for that.
> > >
> > > I've committed another fix for SearchUtils so please get revision 1.23
> > > and try again. Hopefully all 61 will no longer be >1, but if any are
> > > please send me sample blast files again.
> > >
> > > For anyone interested, the bug was due to a completely unbelievable
> > > oversight on my part in the contig merging algorithm: I forgot to deal
> > > with contigs that were fully contained by others. Wow!
> > >
> >
> >
> >
> > --
> > "The way to get started is to quit talking and begin doing."
> >       Walt Disney
> >
> > ========================
> > Thiago Motta Venancio, MSc
> > PhD student in Bioinformatics
> > University of Sao Paulo
> > ========================
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
>
>


-- 
"The way to get started is to quit talking and begin doing."
      Walt Disney

========================
Thiago Motta Venancio, MSc
PhD student in Bioinformatics
University of Sao Paulo
========================
-------------- next part --------------
BLASTN 2.2.6 [Apr-09-2003]


Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. Schaffer,
Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997),
"Gapped BLAST and PSI-BLAST: a new generation of protein database search
programs",  Nucleic Acids Res. 25:3389-3402.

Query= AEDES_05359.C
         (821 letters)

Database: aaegypti.SCAFFOLDS-MASKED.AEDES1.fa
           4758 sequences; 1,383,971,543 total letters

Searching..........done

                                                                 Score    E
Sequences producing significant alignments:                      (bits) Value

supercontig:1:supercont1.60:1:2993848:1 supercontig supercont1.60     779   0.0

>supercontig:1:supercont1.60:1:2993848:1 supercontig supercont1.60
          Length = 2993848

 Score =  779 bits (393), Expect = 0.0
 Identities = 393/393 (100%)
 Strand = Plus / Minus


Query: 336     cctttcatttttacggtgaccttcaccatcggcttctgatgacggcaaaaacgtgtgtgc 395
               ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct: 2976894 cctttcatttttacggtgaccttcaccatcggcttctgatgacggcaaaaacgtgtgtgc 2976835


Query: 396     ctaagttacatgtgccaaaagtttctatttctaccgagtcttgcgtcgtgtgtcgtgagt 455
               ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct: 2976834 ctaagttacatgtgccaaaagtttctatttctaccgagtcttgcgtcgtgtgtcgtgagt 2976775


Query: 456     gaagattgggaagagaacgaaagcctactaaaagcttttttggcatggtgacaagtctcc 515
               ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct: 2976774 gaagattgggaagagaacgaaagcctactaaaagcttttttggcatggtgacaagtctcc 2976715


Query: 516     acgtcttgcgaaatggcgtttccttttatagccacgggtgttcccacacgttttggggta 575
               ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct: 2976714 acgtcttgcgaaatggcgtttccttttatagccacgggtgttcccacacgttttggggta 2976655


Query: 576     ctgtcgggagtagttgctatacgttcaacaggtttaattttgccttgtccgacatgaact 635
               ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct: 2976654 ctgtcgggagtagttgctatacgttcaacaggtttaattttgccttgtccgacatgaact 2976595


Query: 636     ttttcgggttgtccaggtgtaggagttgcagctacgagttggcgcaacaggaatgagatc 695
               ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct: 2976594 ttttcgggttgtccaggtgtaggagttgcagctacgagttggcgcaacaggaatgagatc 2976535


Query: 696     attttgttttgttctggcgggattcttactgct 728
               |||||||||||||||||||||||||||||||||
Sbjct: 2976534 attttgttttgttctggcgggattcttactgct 2976502



 Score =  726 bits (366), Expect = 0.0
 Identities = 388/394 (98%), Gaps = 1/394 (0%)
 Strand = Plus / Minus


Query: 336     cctttcatttttacggtgaccttcaccatcggcttctgatgacggcaaaaacgtgtgtgc 395
               |||||||||||||||||||||||||||||||||||||||||||| ||||||||||||||
Sbjct: 2955826 cctttcatttttacggtgaccttcaccatcggcttctgatgacgacaaaaacgtgtgtgt 2955767


Query: 396     ctaagttacatgtgccaaaagtttctatttctaccgagtcttgcgtcgtgtgtcgtgagt 455
               ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct: 2955766 ctaagttacatgtgccaaaagtttctatttctaccgagtcttgcgtcgtgtgtcgtgagt 2955707


Query: 456     gaagattgggaagagaacgaaagcctactaaaagcttttttggcatggtgacaagtctcc 515
               | | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct: 2955706 ggaaattgggaagagaacgaaagcctactaaaagcttttttggcatggtgacaagtctcc 2955647


Query: 516     acgtcttgcgaaatggcgtttccttttatagccacgggtgttcccacacg-ttttggggt 574
               |||||||||||||||||||||||||||||||||||||||||||||||||| |||||||||
Sbjct: 2955646 acgtcttgcgaaatggcgtttccttttatagccacgggtgttcccacacgtttttggggt 2955587


Query: 575     actgtcgggagtagttgctatacgttcaacaggtttaattttgccttgtccgacatgaac 634
               ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct: 2955586 actgtcgggagtagttgctatacgttcaacaggtttaattttgccttgtccgacatgaac 2955527


Query: 635     tttttcgggttgtccaggtgtaggagttgcagctacgagttggcgcaacaggaatgagat 694
               ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct: 2955526 tttttcgggttgtccaggtgtaggagttgcagctacgagttggcgcaacaggaatgagat 2955467


Query: 695     cattttgttttgttctggcgggattcttactgct 728
               |||||||||||||||||| |||||||||||||||
Sbjct: 2955466 cattttgttttgttctggggggattcttactgct 2955433



 Score =  630 bits (318), Expect = e-178
 Identities = 333/338 (98%)
 Strand = Plus / Minus


Query: 1       gaaactttgtaattaagtgtaaaatatctgcctatctgtgaatttcgccagactatcaat 60
               |||||||||||||||||||||||||| ||||||| |||||||||||||||||||||||||
Sbjct: 2966288 gaaactttgtaattaagtgtaaaatacctgcctacctgtgaatttcgccagactatcaat 2966229


Query: 61      ccatggttaacttttgtcctatcgtcaagatatagtttacaaagatagattattgattat 120
               ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct: 2966228 ccatggttaacttttgtcctatcgtcaagatatagtttacaaagatagattattgattat 2966169


Query: 121     tgatcttaccaagaaacttgttgattacttcgatcgagacctggaatgattgcacacaca 180
               |||||||||| |||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct: 2966168 tgatcttacccagaaacttgttgattacttcgatcgagacctggaatgattgcacacaca 2966109


Query: 181     gcaatgctctgacacctacttcttcgtacaatatttctgcctctttgttatcatcgtctt 240
               ||||||||||||||||||||||||||||||||||||||||||||| ||||||||| ||||
Sbjct: 2966108 gcaatgctctgacacctacttcttcgtacaatatttctgcctcttcgttatcatcatctt 2966049


Query: 241     cgtgggcaattgggtccgaaccctccgaatcaaatttgtcgggctctacttctcttttgg 300
               ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct: 2966048 cgtgggcaattgggtccgaaccctccgaatcaaatttgtcgggctctacttctcttttgg 2965989


Query: 301     gattcatagtggcttgacctcaagcgctaattaatcct 338
               ||||||||||||||||||||||||||||||||||||||
Sbjct: 2965988 gattcatagtggcttgacctcaagcgctaattaatcct 2965951



 Score =  486 bits (245), Expect = e-135
 Identities = 245/245 (100%)
 Strand = Plus / Minus


Query: 94      agtttacaaagatagattattgattattgatcttaccaagaaacttgttgattacttcga 153
               ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct: 2980127 agtttacaaagatagattattgattattgatcttaccaagaaacttgttgattacttcga 2980068


Query: 154     tcgagacctggaatgattgcacacacagcaatgctctgacacctacttcttcgtacaata 213
               ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct: 2980067 tcgagacctggaatgattgcacacacagcaatgctctgacacctacttcttcgtacaata 2980008


Query: 214     tttctgcctctttgttatcatcgtcttcgtgggcaattgggtccgaaccctccgaatcaa 273
               ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct: 2980007 tttctgcctctttgttatcatcgtcttcgtgggcaattgggtccgaaccctccgaatcaa 2979948


Query: 274     atttgtcgggctctacttctcttttgggattcatagtggcttgacctcaagcgctaatta 333
               ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct: 2979947 atttgtcgggctctacttctcttttgggattcatagtggcttgacctcaagcgctaatta 2979888


Query: 334     atcct 338
               |||||
Sbjct: 2979887 atcct 2979883



 Score =  194 bits (98), Expect = 3e-47
 Identities = 98/98 (100%)
 Strand = Plus / Minus


Query: 1       gaaactttgtaattaagtgtaaaatatctgcctatctgtgaatttcgccagactatcaat 60
               ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct: 2982763 gaaactttgtaattaagtgtaaaatatctgcctatctgtgaatttcgccagactatcaat 2982704


Query: 61      ccatggttaacttttgtcctatcgtcaagatatagttt 98
               ||||||||||||||||||||||||||||||||||||||
Sbjct: 2982703 ccatggttaacttttgtcctatcgtcaagatatagttt 2982666



 Score =  145 bits (73), Expect = 2e-32
 Identities = 87/94 (92%)
 Strand = Plus / Minus


Query: 727     cttaaaaaattctacnnnnnnngtttacaatatcaaaactacagtcgacacacatatttt 786
               |||||||||||||||       ||||||||||||||||||||||||||||||||||||||
Sbjct: 2976416 cttaaaaaattctactttttttgtttacaatatcaaaactacagtcgacacacatatttt 2976357


Query: 787     gttaatttgtaggtgttgcttcgattcatcttca 820
               ||||||||||||||||||||||||||||||||||
Sbjct: 2976356 gttaatttgtaggtgttgcttcgattcatcttca 2976323



 Score = 71.9 bits (36), Expect = 3e-10
 Identities = 42/44 (95%)
 Strand = Plus / Minus


Query: 778     acatattttgttaatttgtaggtgttgcttcgattcatcttcac 821
               ||||||||||||||||||||||||| ||||||||||||| ||||
Sbjct: 2955299 acatattttgttaatttgtaggtgtggcttcgattcatcatcac 2955256


>supercontig:1:supercont1.971:1:313087:1 supercontig supercont1.971
          Length = 313087

 Score =  139 bits (70), Expect = 1e-30
 Identities = 103/114 (90%)
 Strand = Plus / Plus


Query: 192    acacctacttcttcgtacaatatttctgcctctttgttatcatcgtcttcgtgggcaatt 251
              |||| ||||||||||||| ||||||||||||||| ||||||||||||||||||||| |||
Sbjct: 202647 acacttacttcttcgtacgatatttctgcctcttcgttatcatcgtcttcgtgggccatt 202706


Query: 252    gggtccgaaccctccgaatcaaatttgtcgggctctacttctcttttgggattc 305
              |||||||| ||||||||| ||| ||| |||||||||| |||| ||| |||||||
Sbjct: 202707 gggtccgatccctccgaagcaattttttcgggctctatttctttttcgggattc 202760



 Score = 71.9 bits (36), Expect = 3e-10
 Identities = 48/52 (92%)
 Strand = Plus / Plus


Query: 123    atcttaccaagaaacttgttgattacttcgatcgagacctggaatgattgca 174
              |||||||||||||||||||| ||||| || ||||||||| ||||||||||||
Sbjct: 202569 atcttaccaagaaacttgtttattacgtctatcgagacccggaatgattgca 202620



 Score = 71.9 bits (36), Expect = 3e-10
 Identities = 72/84 (85%)
 Strand = Plus / Plus


Query: 22     aaatatctgcctatctgtgaatttcgccagactatcaatccatggttaacttttgtccta 81
              ||||| ||||||| |||||| |||||||| ||  ||||||||||| ||||||||||||||
Sbjct: 202376 aaataactgcctacctgtgattttcgccatacattcaatccatggataacttttgtccta 202435


Query: 82     tcgtcaagatatagtttacaaaga 105
              ||  |||||||  ||| |||||||
Sbjct: 202436 tctccaagatacggttcacaaaga 202459


  Database: aaegypti.SCAFFOLDS-MASKED.AEDES1.fa
    Posted date:  Nov 6, 2006  5:26 PM
  Number of letters in database: 1,383,971,543
  Number of sequences in database:  4758

Lambda     K      H
    1.37    0.711     1.31

Gapped
Lambda     K      H
    1.37    0.711     1.31


Matrix: blastn matrix:1 -3
Gap Penalties: Existence: 5, Extension: 2
Number of Hits to DB: 758,150
Number of Sequences: 4758
Number of extensions: 758150
Number of successful extensions: 7086
Number of sequences better than 1.0e-05: 4
Number of HSP's better than  0.0 without gapping: 4
Number of HSP's successfully gapped in prelim test: 0
Number of HSP's that attempted gapping in prelim test: 7012
Number of HSP's gapped (non-prelim): 73
length of query: 821
length of database: 1,383,971,543
effective HSP length: 20
effective length of query: 801
effective length of database: 1,383,876,383
effective search space: 1108484982783
effective search space used: 1108484982783
T: 0
A: 0
X1: 6 (11.9 bits)
X2: 15 (29.7 bits)
S1: 12 (24.3 bits)
S2: 29 (58.0 bits)


More information about the Bioperl-l mailing list