[Bioperl-l] Can't parse blast report written by Bio::SearchIO::Writer::TextResultWriter

Prachi Shah prachi at stanford.edu
Thu May 8 20:54:06 UTC 2008


Hi all,

I am trying to order of HSPs within each BLAST Hit in the order of
ascending P-values. So, I parse my WU-BLAST report using Bio::SearchIO
and create new Result, Hit and HSP objects in the order and then write
out another BLAST report with the
Bio::SearchIO::Writer::TextResultWriter module. All this works fine.
But, when I try to parse this new blast report with
Bio::SearchIO::blast, I get the following error:

------------- EXCEPTION  -------------
MSG: no data for midline Query: 0       1
STACK Bio::SearchIO::blast::next_result
/tools/perl/5.6.1/lib/site_perl/5.6.1/Bio/SearchIO/blast.pm:1151
STACK toplevel bin/testBlastParse.pl:12
--------------------------------------

I have copied below sample sections of both blast reports and the
code. Any hints/ pointers/ suggestions are greatly appreciated.

Thanks,
Prachi



The old vs new blast reports look slightly different, esp. note the
HSP start and stop coordinates for the QUERY sequence.

**Snippet of OLD blast report (generated by WU-BLAST):
----------------------------------------------------------------------------------------------------
Query=  orf19.4890
        (4931 letters)

Database:  Ca21_Chromosomes
           9 sequences; 14,324,492 total letters.
Searching....10....20....30....40....50....60....70....80....90....100% done

WARNING:  hspmax=1000 was exceeded by 8 of the database sequences, causing the
          associated cutoff score, S2, to be transiently set as high as 113.

                                                                     Smallest
                                                                       Sum
                                                              High  Probability
Sequences producing High-scoring Segment Pairs:              Score  P(N)      N

Ca21chr1 Assembly 21, Ca21chr1 (3188577 nucleotides)         24655  0.        1
Ca21chr5 Assembly 21, Ca21chr5 (1190941 nucleotides)          1682  3.4e-68   3
Ca21chr6 Assembly 21, Ca21chr6 (1033553 nucleotides)           908  3.0e-34   3
Ca21chr2 Assembly 21, Ca21chr2 (2232049 nucleotides)           859  4.7e-30   1
Ca21chr7 Assembly 21, Ca21chr7 (949626 nucleotides)            492  7.3e-24   3
Ca21chr4 Assembly 21, Ca21chr4 (1603475 nucleotides)           528  9.8e-21   2
Ca21chrR Assembly 21, Ca21chrR (2286425 nucleotides)           520  1.4e-19   5
Ca21chr3 Assembly 21, Ca21chr3 (1799426 nucleotides)           502  1.7e-14   2
Ca19-mtDNA Assembly 19, Ca19-mtDNA (40420 nucleotides)         313  2.9e-06   2


>Ca21chr1 Assembly 21, Ca21chr1 (3188577 nucleotides)
        Length = 3,188,577

  Plus Strand HSPs:

 Score = 506 (82.0 bits), Expect = 4.9e-14, P = 4.9e-14
 Identities = 850/1549 (54%), Positives = 850/1549 (54%), Strand = Plus / Plus

Query:   3450 ATGCATATGGTAATGTTAA-AATCACTGATTTTGGA-TTTTGTGCTAAATTAAC-T-GAT 3505
              | | ||| | | || |||| |||  ||||| ||| | ||||| || |  || |  | | |
Sbjct: 155924 AGGGATACGATTAT-TTAAGAATT-CTGATATTGAAATTTTG-GC-ATTTTCATATAGTT
155979

Query:   3506 CAAAGA--AATAAACGTGCC-ACAATGGTGGGGACACCATATTGG-ATGGCACCTGAAGT 3561
              |||| |  ||||||  |    | |||| ||     | ||| | |  |||   |  | | |
Sbjct: 155980 CAAACATTAATAAATATATTGAAAATGTTGATTTAATCAT-TAGTCATG---CTGGTACT
156035

Query:   3562 GGTTAAACAAAAGGAATATGATGAAAAAGTTGATGTTTGGTCATTGGGGATTATGACTAT 3621
              || | ||  |  |  || || | | |   ||||   |    ||||     ||||   ||
Sbjct: 156036 GGATCAATCATTG--AT-TGTTTACAT--TTGAA--TAAACCATTAATTGTTATTGTTAA
156088

Query:   3622 TGAAATGATTGAAGGAGAACCACCTTATTTGAA-T-GAAGAACCATTAAAAGCATTATAT 3679
----------------------------------------------------------------------------------------------------

**Snippet of NEW blast report (generated using
Bio::SearchIO::Writer::TextResultWriter)
----------------------------------------------------------------------------------------------------
uery= orf19.4890
       (4,931 letters)

Database: Ca21_Chromosomes
           9 sequences; 14,324,492 total letters

                                                                 Score       E
Sequences producing significant alignments:                      (bits)    value
Ca21chr1 Assembly 21, Ca21chr1 (3188577 nucleotides)                24655  0.
Ca21chr5 Assembly 21, Ca21chr5 (1190941 nucleotides)
1682  3.4e-68
Ca21chr6 Assembly 21, Ca21chr6 (1033553 nucleotides)
908   3.0e-34
Ca21chr2 Assembly 21, Ca21chr2 (2232049 nucleotides)
859   4.7e-30
Ca21chr7 Assembly 21, Ca21chr7 (949626 nucleotides)
492   7.3e-24
Ca21chr4 Assembly 21, Ca21chr4 (1603475 nucleotides)
528   9.8e-21
Ca21chrR Assembly 21, Ca21chrR (2286425 nucleotides)
520   1.4e-19
Ca21chr3 Assembly 21, Ca21chr3 (1799426 nucleotides)
502   1.7e-14
Ca19-mtDNA Assembly 19, Ca19-mtDNA (40420 nucleotides)
313   2.9e-06


>Ca21chr1 Assembly 21, Ca21chr1 (3188577 nucleotides)
         Length = 3188577

 Score = 3705.3 bits (24655), Expect = 0., P = 0.
 Identities = 4931/4931 (100%)
 Frame =  -1 / +1

Query: 1       ATAAAGGATGCCAAATAGTAGTAGTAAAATAGTAAATAGAATTGCAAAACAAAAATGATT -58
               ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct: 2248574 ATAAAGGATGCCAAATAGTAGTAGTAAAATAGTAAATAGAATTGCAAAACAAAAATGATT
2248633

Query: -59     AAATAGCCCTTTATCAATAAATTTTTAAAGTTAGTTTCTTCTGGAACCCTACCCTCTTGG -118
               ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct: 2248634 AAATAGCCCTTTATCAATAAATTTTTAAAGTTAGTTTCTTCTGGAACCCTACCCTCTTGG
2248693

Query: -119    TGTTAATCTTTTAAGTTAATATTTATAGTTAATAAAGTAGAAGTGTCTATTTATTGATTG -178
               ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct: 2248694 TGTTAATCTTTTAAGTTAATATTTATAGTTAATAAAGTAGAAGTGTCTATTTATTGATTG
2248753

Query: -179    TTGTTGTTGTTGATTAAGAATATAAAGAAAAACAGAAAAGAAAAAAAGAAGGTTTAAAAA -238
               ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct: 2248754 TTGTTGTTGTTGATTAAGAATATAAAGAAAAACAGAAAAGAAAAAAAGAAGGTTTAAAAA
2248813

Query: -239    AGTTAATTGTGAAGTAAAAGGGTTGAAAAATTTTTTTTTTTTCTGTTTCTCTCTTTGAGA -298
               ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct: 2248814 AGTTAATTGTGAAGTAAAAGGGTTGAAAAATTTTTTTTTTTTCTGTTTCTCTCTTTGAGA
2248873

Query: -299    TTCTTTGACATATTTATTATTATAACACTATGCTATACTAAAAACAGTACTACCAATTGA -358
               ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct: 2248874 TTCTTTGACATATTTATTATTATAACACTATGCTATACTAAAAACAGTACTACCAATTGA
2248933

Query: -359    ATTAAATTAAATTAAATTAAATTAAATTATTAGACCAATTTCAATAAAGATAAGCAATTT -418

----------------------------------------------------------------------------------------------------

**Here is the snippet of code that reads the old report, generates new
objects and writes new report:
----------------------------------------------------------------------------------------------------
    my $blast_report = Bio::SearchIO->new(-format => 'blast',
                      -file   => $blastOutputTmp);

    my $writer =
Bio::SearchIO::Writer::TextResultWriter->new(-no_wublastlinks => 0);
    my $out_blast_report = Bio::SearchIO->new(-writer => $writer,
                          -file   => ">$blastOutputFile");

    my $sorted_blast_report;

    while( my $result = $blast_report->next_result ) {

        my (%parameters, %statistics);

    foreach my $param ($result->available_parameters) {

        $parameters{$param} = $result->get_parameter($param);
    }

    foreach my $stat ($result->available_statistics) {

        $statistics{$stat} = $result->get_statistic($stat);
    }

        my $generic_result =
Bio::Search::Result::BlastResult->new(-query_name          =>
$result->query_name,
                                   -query_length        =>
$result->query_length,
                                   -database_name       =>
$result->database_name,
                                   -database_entries    =>
$result->database_entries,
                                   -parameters          => \%parameters,
                                   -statistics          => \%statistics,
                                   -algorithm           => $result->algorithm,
                                   -query_description   =>
$result->query_description,
                                   -algorithm_reference =>
$result->algorithm_reference,
                                   -algorithm_version   =>
$result->algorithm_version,
                                   -database_letters    =>
$result->database_letters);

        while( my $hit = $result->next_hit ) {

        my $generic_hit = Bio::Search::Hit::BlastHit->new(-name
 => $hit->name,
                                  -algorithm    => $hit->algorithm,
                                  -description  => $hit->description,
                                  -length       => $hit->length,
                                  -score        => $hit->score,
                                  -bits         => $hit->bits,
                                  -significance => $hit->significance);

        my (@hsp_sorted, @hsps);
            while( my $hsp = $hit->next_hsp ) {

            push(@hsps, $hsp);
        }

        @hsp_sorted = sort {$a->pvalue <=> $b->pvalue} @hsps;

        for(my $i=0; $i<=$#hsp_sorted; $i++) {

            $generic_hit->add_hsp($hsp_sorted[$i]);

        }

        $generic_result->add_hit($generic_hit);

    }

    $out_blast_report->write_result($generic_result);

    }
----------------------------------------------------------------------------------------------------



More information about the Bioperl-l mailing list