[Bioperl-l] frac_aligned_query returning results >1.
Thiago Venancio
thiago.venancio at gmail.com
Sat Mar 3 12:41:39 UTC 2007
Hi all.
Sorry about this, but the bug persists. Although the number of problematic
cases is too low (3 out of 35139), they are present.
Please find attached an example buggy blast report.
The line I use to call the function is:
print $result->query_name."\t".$hit->frac_aligned_query."\n";
The warning bellow is still appearing a lot of times during processing
reports, so I think it is not due to the same bug.
------------- EXCEPTION: Bio::Root::Exception -------------
MSG: Undefined sub-sequence (821,821). Valid range = 778 - 821
STACK: Error::throw
STACK: Bio::Root::Root::throw /usr/share/perl5/Bio/Root/Root.pm:328
STACK: Bio::Search::HSP::HSPI::matches
/usr/share/perl5/Bio/Search/HSP/HSPI.pm:711
STACK: Bio::Search::SearchUtils::_adjust_contigs
/usr/share/perl5/Bio/Search/SearchUtils.pm:421
STACK: Bio::Search::SearchUtils::tile_hsps
/usr/share/perl5/Bio/Search/SearchUtils.pm:200
STACK: Bio::Search::Hit::GenericHit::frac_aligned_query
/usr/share/perl5/Bio/Search/Hit/GenericHit.pm:1145
STACK: ./geraStatGenome.pl:34
-----------------------------------------------------------
I have checked the code, but I have no idea about what is happening in this
case. the attached file produced the ">1" result and pops the exception
error, so it could be useful.
Thiago
On 3/2/07, Steve Chervitz <sac at bioperl.org> wrote:
>
> Glad you fixed the problem, Sendu.
>
> I thought this might have been due to a problem in HSPI::matches() since
> it was reporting (1507,1507) as an invalid range within (1444,1507), when it
> should be valid (the last position). So it looked like an edge condition
> bug, but I didn't confirm. So there still could be a lingering problem in
> the matches() function, or in the way the matches string is parsed from the
> report.
>
> Speaking of which, HSPI::matches() is quite BLAST-specific. It's even
> format specific, since it won't work if you are parsing in tabular blast
> reports as they lack any string of match symbols. I thought about moving the
> matches implementation in HSPI into BlastHSP.pm, but that module appears
> to not be used anymore. Not sure the way to go here.
>
> Steve
>
> On 3/2/07, Thiago Venancio < thiago.venancio at gmail.com> wrote:
>
> > Hi Sendu,
> >
> > Great to know you fixed the problem.
> > I have updated the SearchUtils and seems to be correct now.
> >
> > Best!
> >
> > Thiago
> >
> >
> > On 3/2/07, Sendu Bala <bix at sendu.me.uk> wrote:
> > >
> > > Thiago Venancio wrote:
> > > > Hi Sendu and Chris,
> > > >
> > > > Thanks for the help.
> > > > As I mentioned, I have updated my SearchUtils file from:
> > > >
> > >
> > http://code.open-bio.org/cgi-bin/viewcvs/viewcvs.cgi/bioperl-live/Bio/Search/SearchUtils.pm
> > > > <
> > >
> > http://code.open-bio.org/cgi-bin/viewcvs/viewcvs.cgi/bioperl-live/Bio/Search/SearchUtils.pm
> > > >
> > > >
> > > > I am also using the lates BioPerl version, installed from CPAN.
> > > >
> > > > Please find a buggy blast report attached.
> > > > In this case, the frac_aligned_query() outputs "1.04", but I have
> > others
> > > > with " 1.57" for example.
> > > >
> > > > Just for a quantitative aspect, I got ">1" values in only 61 /
> > 53,377.
> > >
> > > Many thanks for that.
> > >
> > > I've committed another fix for SearchUtils so please get revision 1.23
> > > and try again. Hopefully all 61 will no longer be >1, but if any are
> > > please send me sample blast files again.
> > >
> > > For anyone interested, the bug was due to a completely unbelievable
> > > oversight on my part in the contig merging algorithm: I forgot to deal
> > > with contigs that were fully contained by others. Wow!
> > >
> >
> >
> >
> > --
> > "The way to get started is to quit talking and begin doing."
> > Walt Disney
> >
> > ========================
> > Thiago Motta Venancio, MSc
> > PhD student in Bioinformatics
> > University of Sao Paulo
> > ========================
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
>
>
--
"The way to get started is to quit talking and begin doing."
Walt Disney
========================
Thiago Motta Venancio, MSc
PhD student in Bioinformatics
University of Sao Paulo
========================
-------------- next part --------------
BLASTN 2.2.6 [Apr-09-2003]
Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. Schaffer,
Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997),
"Gapped BLAST and PSI-BLAST: a new generation of protein database search
programs", Nucleic Acids Res. 25:3389-3402.
Query= AEDES_05359.C
(821 letters)
Database: aaegypti.SCAFFOLDS-MASKED.AEDES1.fa
4758 sequences; 1,383,971,543 total letters
Searching..........done
Score E
Sequences producing significant alignments: (bits) Value
supercontig:1:supercont1.60:1:2993848:1 supercontig supercont1.60 779 0.0
>supercontig:1:supercont1.60:1:2993848:1 supercontig supercont1.60
Length = 2993848
Score = 779 bits (393), Expect = 0.0
Identities = 393/393 (100%)
Strand = Plus / Minus
Query: 336 cctttcatttttacggtgaccttcaccatcggcttctgatgacggcaaaaacgtgtgtgc 395
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct: 2976894 cctttcatttttacggtgaccttcaccatcggcttctgatgacggcaaaaacgtgtgtgc 2976835
Query: 396 ctaagttacatgtgccaaaagtttctatttctaccgagtcttgcgtcgtgtgtcgtgagt 455
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct: 2976834 ctaagttacatgtgccaaaagtttctatttctaccgagtcttgcgtcgtgtgtcgtgagt 2976775
Query: 456 gaagattgggaagagaacgaaagcctactaaaagcttttttggcatggtgacaagtctcc 515
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct: 2976774 gaagattgggaagagaacgaaagcctactaaaagcttttttggcatggtgacaagtctcc 2976715
Query: 516 acgtcttgcgaaatggcgtttccttttatagccacgggtgttcccacacgttttggggta 575
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct: 2976714 acgtcttgcgaaatggcgtttccttttatagccacgggtgttcccacacgttttggggta 2976655
Query: 576 ctgtcgggagtagttgctatacgttcaacaggtttaattttgccttgtccgacatgaact 635
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct: 2976654 ctgtcgggagtagttgctatacgttcaacaggtttaattttgccttgtccgacatgaact 2976595
Query: 636 ttttcgggttgtccaggtgtaggagttgcagctacgagttggcgcaacaggaatgagatc 695
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct: 2976594 ttttcgggttgtccaggtgtaggagttgcagctacgagttggcgcaacaggaatgagatc 2976535
Query: 696 attttgttttgttctggcgggattcttactgct 728
|||||||||||||||||||||||||||||||||
Sbjct: 2976534 attttgttttgttctggcgggattcttactgct 2976502
Score = 726 bits (366), Expect = 0.0
Identities = 388/394 (98%), Gaps = 1/394 (0%)
Strand = Plus / Minus
Query: 336 cctttcatttttacggtgaccttcaccatcggcttctgatgacggcaaaaacgtgtgtgc 395
|||||||||||||||||||||||||||||||||||||||||||| ||||||||||||||
Sbjct: 2955826 cctttcatttttacggtgaccttcaccatcggcttctgatgacgacaaaaacgtgtgtgt 2955767
Query: 396 ctaagttacatgtgccaaaagtttctatttctaccgagtcttgcgtcgtgtgtcgtgagt 455
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct: 2955766 ctaagttacatgtgccaaaagtttctatttctaccgagtcttgcgtcgtgtgtcgtgagt 2955707
Query: 456 gaagattgggaagagaacgaaagcctactaaaagcttttttggcatggtgacaagtctcc 515
| | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct: 2955706 ggaaattgggaagagaacgaaagcctactaaaagcttttttggcatggtgacaagtctcc 2955647
Query: 516 acgtcttgcgaaatggcgtttccttttatagccacgggtgttcccacacg-ttttggggt 574
|||||||||||||||||||||||||||||||||||||||||||||||||| |||||||||
Sbjct: 2955646 acgtcttgcgaaatggcgtttccttttatagccacgggtgttcccacacgtttttggggt 2955587
Query: 575 actgtcgggagtagttgctatacgttcaacaggtttaattttgccttgtccgacatgaac 634
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct: 2955586 actgtcgggagtagttgctatacgttcaacaggtttaattttgccttgtccgacatgaac 2955527
Query: 635 tttttcgggttgtccaggtgtaggagttgcagctacgagttggcgcaacaggaatgagat 694
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct: 2955526 tttttcgggttgtccaggtgtaggagttgcagctacgagttggcgcaacaggaatgagat 2955467
Query: 695 cattttgttttgttctggcgggattcttactgct 728
|||||||||||||||||| |||||||||||||||
Sbjct: 2955466 cattttgttttgttctggggggattcttactgct 2955433
Score = 630 bits (318), Expect = e-178
Identities = 333/338 (98%)
Strand = Plus / Minus
Query: 1 gaaactttgtaattaagtgtaaaatatctgcctatctgtgaatttcgccagactatcaat 60
|||||||||||||||||||||||||| ||||||| |||||||||||||||||||||||||
Sbjct: 2966288 gaaactttgtaattaagtgtaaaatacctgcctacctgtgaatttcgccagactatcaat 2966229
Query: 61 ccatggttaacttttgtcctatcgtcaagatatagtttacaaagatagattattgattat 120
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct: 2966228 ccatggttaacttttgtcctatcgtcaagatatagtttacaaagatagattattgattat 2966169
Query: 121 tgatcttaccaagaaacttgttgattacttcgatcgagacctggaatgattgcacacaca 180
|||||||||| |||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct: 2966168 tgatcttacccagaaacttgttgattacttcgatcgagacctggaatgattgcacacaca 2966109
Query: 181 gcaatgctctgacacctacttcttcgtacaatatttctgcctctttgttatcatcgtctt 240
||||||||||||||||||||||||||||||||||||||||||||| ||||||||| ||||
Sbjct: 2966108 gcaatgctctgacacctacttcttcgtacaatatttctgcctcttcgttatcatcatctt 2966049
Query: 241 cgtgggcaattgggtccgaaccctccgaatcaaatttgtcgggctctacttctcttttgg 300
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct: 2966048 cgtgggcaattgggtccgaaccctccgaatcaaatttgtcgggctctacttctcttttgg 2965989
Query: 301 gattcatagtggcttgacctcaagcgctaattaatcct 338
||||||||||||||||||||||||||||||||||||||
Sbjct: 2965988 gattcatagtggcttgacctcaagcgctaattaatcct 2965951
Score = 486 bits (245), Expect = e-135
Identities = 245/245 (100%)
Strand = Plus / Minus
Query: 94 agtttacaaagatagattattgattattgatcttaccaagaaacttgttgattacttcga 153
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct: 2980127 agtttacaaagatagattattgattattgatcttaccaagaaacttgttgattacttcga 2980068
Query: 154 tcgagacctggaatgattgcacacacagcaatgctctgacacctacttcttcgtacaata 213
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct: 2980067 tcgagacctggaatgattgcacacacagcaatgctctgacacctacttcttcgtacaata 2980008
Query: 214 tttctgcctctttgttatcatcgtcttcgtgggcaattgggtccgaaccctccgaatcaa 273
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct: 2980007 tttctgcctctttgttatcatcgtcttcgtgggcaattgggtccgaaccctccgaatcaa 2979948
Query: 274 atttgtcgggctctacttctcttttgggattcatagtggcttgacctcaagcgctaatta 333
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct: 2979947 atttgtcgggctctacttctcttttgggattcatagtggcttgacctcaagcgctaatta 2979888
Query: 334 atcct 338
|||||
Sbjct: 2979887 atcct 2979883
Score = 194 bits (98), Expect = 3e-47
Identities = 98/98 (100%)
Strand = Plus / Minus
Query: 1 gaaactttgtaattaagtgtaaaatatctgcctatctgtgaatttcgccagactatcaat 60
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct: 2982763 gaaactttgtaattaagtgtaaaatatctgcctatctgtgaatttcgccagactatcaat 2982704
Query: 61 ccatggttaacttttgtcctatcgtcaagatatagttt 98
||||||||||||||||||||||||||||||||||||||
Sbjct: 2982703 ccatggttaacttttgtcctatcgtcaagatatagttt 2982666
Score = 145 bits (73), Expect = 2e-32
Identities = 87/94 (92%)
Strand = Plus / Minus
Query: 727 cttaaaaaattctacnnnnnnngtttacaatatcaaaactacagtcgacacacatatttt 786
||||||||||||||| ||||||||||||||||||||||||||||||||||||||
Sbjct: 2976416 cttaaaaaattctactttttttgtttacaatatcaaaactacagtcgacacacatatttt 2976357
Query: 787 gttaatttgtaggtgttgcttcgattcatcttca 820
||||||||||||||||||||||||||||||||||
Sbjct: 2976356 gttaatttgtaggtgttgcttcgattcatcttca 2976323
Score = 71.9 bits (36), Expect = 3e-10
Identities = 42/44 (95%)
Strand = Plus / Minus
Query: 778 acatattttgttaatttgtaggtgttgcttcgattcatcttcac 821
||||||||||||||||||||||||| ||||||||||||| ||||
Sbjct: 2955299 acatattttgttaatttgtaggtgtggcttcgattcatcatcac 2955256
>supercontig:1:supercont1.971:1:313087:1 supercontig supercont1.971
Length = 313087
Score = 139 bits (70), Expect = 1e-30
Identities = 103/114 (90%)
Strand = Plus / Plus
Query: 192 acacctacttcttcgtacaatatttctgcctctttgttatcatcgtcttcgtgggcaatt 251
|||| ||||||||||||| ||||||||||||||| ||||||||||||||||||||| |||
Sbjct: 202647 acacttacttcttcgtacgatatttctgcctcttcgttatcatcgtcttcgtgggccatt 202706
Query: 252 gggtccgaaccctccgaatcaaatttgtcgggctctacttctcttttgggattc 305
|||||||| ||||||||| ||| ||| |||||||||| |||| ||| |||||||
Sbjct: 202707 gggtccgatccctccgaagcaattttttcgggctctatttctttttcgggattc 202760
Score = 71.9 bits (36), Expect = 3e-10
Identities = 48/52 (92%)
Strand = Plus / Plus
Query: 123 atcttaccaagaaacttgttgattacttcgatcgagacctggaatgattgca 174
|||||||||||||||||||| ||||| || ||||||||| ||||||||||||
Sbjct: 202569 atcttaccaagaaacttgtttattacgtctatcgagacccggaatgattgca 202620
Score = 71.9 bits (36), Expect = 3e-10
Identities = 72/84 (85%)
Strand = Plus / Plus
Query: 22 aaatatctgcctatctgtgaatttcgccagactatcaatccatggttaacttttgtccta 81
||||| ||||||| |||||| |||||||| || ||||||||||| ||||||||||||||
Sbjct: 202376 aaataactgcctacctgtgattttcgccatacattcaatccatggataacttttgtccta 202435
Query: 82 tcgtcaagatatagtttacaaaga 105
|| ||||||| ||| |||||||
Sbjct: 202436 tctccaagatacggttcacaaaga 202459
Database: aaegypti.SCAFFOLDS-MASKED.AEDES1.fa
Posted date: Nov 6, 2006 5:26 PM
Number of letters in database: 1,383,971,543
Number of sequences in database: 4758
Lambda K H
1.37 0.711 1.31
Gapped
Lambda K H
1.37 0.711 1.31
Matrix: blastn matrix:1 -3
Gap Penalties: Existence: 5, Extension: 2
Number of Hits to DB: 758,150
Number of Sequences: 4758
Number of extensions: 758150
Number of successful extensions: 7086
Number of sequences better than 1.0e-05: 4
Number of HSP's better than 0.0 without gapping: 4
Number of HSP's successfully gapped in prelim test: 0
Number of HSP's that attempted gapping in prelim test: 7012
Number of HSP's gapped (non-prelim): 73
length of query: 821
length of database: 1,383,971,543
effective HSP length: 20
effective length of query: 801
effective length of database: 1,383,876,383
effective search space: 1108484982783
effective search space used: 1108484982783
T: 0
A: 0
X1: 6 (11.9 bits)
X2: 15 (29.7 bits)
S1: 12 (24.3 bits)
S2: 29 (58.0 bits)
More information about the Bioperl-l
mailing list