[Bioperl-l] Can't parse blast report written by Bio::SearchIO::Writer::TextResultWriter
Prachi Shah
prachi at stanford.edu
Thu May 8 20:54:06 UTC 2008
Hi all,
I am trying to order of HSPs within each BLAST Hit in the order of
ascending P-values. So, I parse my WU-BLAST report using Bio::SearchIO
and create new Result, Hit and HSP objects in the order and then write
out another BLAST report with the
Bio::SearchIO::Writer::TextResultWriter module. All this works fine.
But, when I try to parse this new blast report with
Bio::SearchIO::blast, I get the following error:
------------- EXCEPTION -------------
MSG: no data for midline Query: 0 1
STACK Bio::SearchIO::blast::next_result
/tools/perl/5.6.1/lib/site_perl/5.6.1/Bio/SearchIO/blast.pm:1151
STACK toplevel bin/testBlastParse.pl:12
--------------------------------------
I have copied below sample sections of both blast reports and the
code. Any hints/ pointers/ suggestions are greatly appreciated.
Thanks,
Prachi
The old vs new blast reports look slightly different, esp. note the
HSP start and stop coordinates for the QUERY sequence.
**Snippet of OLD blast report (generated by WU-BLAST):
----------------------------------------------------------------------------------------------------
Query= orf19.4890
(4931 letters)
Database: Ca21_Chromosomes
9 sequences; 14,324,492 total letters.
Searching....10....20....30....40....50....60....70....80....90....100% done
WARNING: hspmax=1000 was exceeded by 8 of the database sequences, causing the
associated cutoff score, S2, to be transiently set as high as 113.
Smallest
Sum
High Probability
Sequences producing High-scoring Segment Pairs: Score P(N) N
Ca21chr1 Assembly 21, Ca21chr1 (3188577 nucleotides) 24655 0. 1
Ca21chr5 Assembly 21, Ca21chr5 (1190941 nucleotides) 1682 3.4e-68 3
Ca21chr6 Assembly 21, Ca21chr6 (1033553 nucleotides) 908 3.0e-34 3
Ca21chr2 Assembly 21, Ca21chr2 (2232049 nucleotides) 859 4.7e-30 1
Ca21chr7 Assembly 21, Ca21chr7 (949626 nucleotides) 492 7.3e-24 3
Ca21chr4 Assembly 21, Ca21chr4 (1603475 nucleotides) 528 9.8e-21 2
Ca21chrR Assembly 21, Ca21chrR (2286425 nucleotides) 520 1.4e-19 5
Ca21chr3 Assembly 21, Ca21chr3 (1799426 nucleotides) 502 1.7e-14 2
Ca19-mtDNA Assembly 19, Ca19-mtDNA (40420 nucleotides) 313 2.9e-06 2
>Ca21chr1 Assembly 21, Ca21chr1 (3188577 nucleotides)
Length = 3,188,577
Plus Strand HSPs:
Score = 506 (82.0 bits), Expect = 4.9e-14, P = 4.9e-14
Identities = 850/1549 (54%), Positives = 850/1549 (54%), Strand = Plus / Plus
Query: 3450 ATGCATATGGTAATGTTAA-AATCACTGATTTTGGA-TTTTGTGCTAAATTAAC-T-GAT 3505
| | ||| | | || |||| ||| ||||| ||| | ||||| || | || | | | |
Sbjct: 155924 AGGGATACGATTAT-TTAAGAATT-CTGATATTGAAATTTTG-GC-ATTTTCATATAGTT
155979
Query: 3506 CAAAGA--AATAAACGTGCC-ACAATGGTGGGGACACCATATTGG-ATGGCACCTGAAGT 3561
|||| | |||||| | | |||| || | ||| | | ||| | | | |
Sbjct: 155980 CAAACATTAATAAATATATTGAAAATGTTGATTTAATCAT-TAGTCATG---CTGGTACT
156035
Query: 3562 GGTTAAACAAAAGGAATATGATGAAAAAGTTGATGTTTGGTCATTGGGGATTATGACTAT 3621
|| | || | | || || | | | |||| | |||| |||| ||
Sbjct: 156036 GGATCAATCATTG--AT-TGTTTACAT--TTGAA--TAAACCATTAATTGTTATTGTTAA
156088
Query: 3622 TGAAATGATTGAAGGAGAACCACCTTATTTGAA-T-GAAGAACCATTAAAAGCATTATAT 3679
----------------------------------------------------------------------------------------------------
**Snippet of NEW blast report (generated using
Bio::SearchIO::Writer::TextResultWriter)
----------------------------------------------------------------------------------------------------
uery= orf19.4890
(4,931 letters)
Database: Ca21_Chromosomes
9 sequences; 14,324,492 total letters
Score E
Sequences producing significant alignments: (bits) value
Ca21chr1 Assembly 21, Ca21chr1 (3188577 nucleotides) 24655 0.
Ca21chr5 Assembly 21, Ca21chr5 (1190941 nucleotides)
1682 3.4e-68
Ca21chr6 Assembly 21, Ca21chr6 (1033553 nucleotides)
908 3.0e-34
Ca21chr2 Assembly 21, Ca21chr2 (2232049 nucleotides)
859 4.7e-30
Ca21chr7 Assembly 21, Ca21chr7 (949626 nucleotides)
492 7.3e-24
Ca21chr4 Assembly 21, Ca21chr4 (1603475 nucleotides)
528 9.8e-21
Ca21chrR Assembly 21, Ca21chrR (2286425 nucleotides)
520 1.4e-19
Ca21chr3 Assembly 21, Ca21chr3 (1799426 nucleotides)
502 1.7e-14
Ca19-mtDNA Assembly 19, Ca19-mtDNA (40420 nucleotides)
313 2.9e-06
>Ca21chr1 Assembly 21, Ca21chr1 (3188577 nucleotides)
Length = 3188577
Score = 3705.3 bits (24655), Expect = 0., P = 0.
Identities = 4931/4931 (100%)
Frame = -1 / +1
Query: 1 ATAAAGGATGCCAAATAGTAGTAGTAAAATAGTAAATAGAATTGCAAAACAAAAATGATT -58
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct: 2248574 ATAAAGGATGCCAAATAGTAGTAGTAAAATAGTAAATAGAATTGCAAAACAAAAATGATT
2248633
Query: -59 AAATAGCCCTTTATCAATAAATTTTTAAAGTTAGTTTCTTCTGGAACCCTACCCTCTTGG -118
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct: 2248634 AAATAGCCCTTTATCAATAAATTTTTAAAGTTAGTTTCTTCTGGAACCCTACCCTCTTGG
2248693
Query: -119 TGTTAATCTTTTAAGTTAATATTTATAGTTAATAAAGTAGAAGTGTCTATTTATTGATTG -178
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct: 2248694 TGTTAATCTTTTAAGTTAATATTTATAGTTAATAAAGTAGAAGTGTCTATTTATTGATTG
2248753
Query: -179 TTGTTGTTGTTGATTAAGAATATAAAGAAAAACAGAAAAGAAAAAAAGAAGGTTTAAAAA -238
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct: 2248754 TTGTTGTTGTTGATTAAGAATATAAAGAAAAACAGAAAAGAAAAAAAGAAGGTTTAAAAA
2248813
Query: -239 AGTTAATTGTGAAGTAAAAGGGTTGAAAAATTTTTTTTTTTTCTGTTTCTCTCTTTGAGA -298
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct: 2248814 AGTTAATTGTGAAGTAAAAGGGTTGAAAAATTTTTTTTTTTTCTGTTTCTCTCTTTGAGA
2248873
Query: -299 TTCTTTGACATATTTATTATTATAACACTATGCTATACTAAAAACAGTACTACCAATTGA -358
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct: 2248874 TTCTTTGACATATTTATTATTATAACACTATGCTATACTAAAAACAGTACTACCAATTGA
2248933
Query: -359 ATTAAATTAAATTAAATTAAATTAAATTATTAGACCAATTTCAATAAAGATAAGCAATTT -418
----------------------------------------------------------------------------------------------------
**Here is the snippet of code that reads the old report, generates new
objects and writes new report:
----------------------------------------------------------------------------------------------------
my $blast_report = Bio::SearchIO->new(-format => 'blast',
-file => $blastOutputTmp);
my $writer =
Bio::SearchIO::Writer::TextResultWriter->new(-no_wublastlinks => 0);
my $out_blast_report = Bio::SearchIO->new(-writer => $writer,
-file => ">$blastOutputFile");
my $sorted_blast_report;
while( my $result = $blast_report->next_result ) {
my (%parameters, %statistics);
foreach my $param ($result->available_parameters) {
$parameters{$param} = $result->get_parameter($param);
}
foreach my $stat ($result->available_statistics) {
$statistics{$stat} = $result->get_statistic($stat);
}
my $generic_result =
Bio::Search::Result::BlastResult->new(-query_name =>
$result->query_name,
-query_length =>
$result->query_length,
-database_name =>
$result->database_name,
-database_entries =>
$result->database_entries,
-parameters => \%parameters,
-statistics => \%statistics,
-algorithm => $result->algorithm,
-query_description =>
$result->query_description,
-algorithm_reference =>
$result->algorithm_reference,
-algorithm_version =>
$result->algorithm_version,
-database_letters =>
$result->database_letters);
while( my $hit = $result->next_hit ) {
my $generic_hit = Bio::Search::Hit::BlastHit->new(-name
=> $hit->name,
-algorithm => $hit->algorithm,
-description => $hit->description,
-length => $hit->length,
-score => $hit->score,
-bits => $hit->bits,
-significance => $hit->significance);
my (@hsp_sorted, @hsps);
while( my $hsp = $hit->next_hsp ) {
push(@hsps, $hsp);
}
@hsp_sorted = sort {$a->pvalue <=> $b->pvalue} @hsps;
for(my $i=0; $i<=$#hsp_sorted; $i++) {
$generic_hit->add_hsp($hsp_sorted[$i]);
}
$generic_result->add_hit($generic_hit);
}
$out_blast_report->write_result($generic_result);
}
----------------------------------------------------------------------------------------------------
More information about the Bioperl-l
mailing list