[Bioperl-l] Blast parsing question
Jason Stajich
jason@cgt.mc.duke.edu
Thu, 25 Apr 2002 08:58:18 -0400 (EDT)
This is fixed in CVS live code but not in the release - sorry I wasn't
initiated
to the full breadth of wublast format fun.
-jason
On Thu, 25 Apr 2002, Chris Strassel wrote:
> Not sure if this is a known problem or not (I've been out of touch for a while).
>
> I'm trying to parse the following blast output:
>
> ... Sum
> High Probability
> Sequences producing High-scoring Segment Pairs: Score P(N) N
>
> [GI:1131572] [LN:G14809] [AC:G14809] [OR:Homo sapiens] [D... 996 7.4e-40 1
> [GI:9392631] [LN:AF257078] [AC:AF257078] [OR:Homo sapiens... 462 2.2e-15 1
> [GI:2687802] [LN:HS179N16T] [AC:AL020972] [OR:Homo sapien... 427 4.4e-14 1
> [GI:605020] [LN:HUMUT6640] [AC:L30574] [OR:Homo sapiens] ... 396 3.4e-13 1
> [GI:3168695] [LN:G38121] [AC:G38121] [OR:Homo sapiens] [D... 388 8.7e-13 1
> [GI:605053] [LN:HUMUT7241] [AC:L30409] [OR:Homo sapiens] ... 379 2.4e-12 1
> [GI:2734402] [LN:G36735] [AC:G36735] [OR:Homo sapiens] [D... 377 3.4e-12 1
> [GI:1113732] [LN:HUMSWS3328] [AC:G13119] [OR:Homo sapiens... 366 1.4e-11 1
> [GI:12025508] [LN:G67450] [AC:G67450] [OR:Homo sapiens] [... 362 1.9e-11 1
> [GI:2996755] [LN:G37104] [AC:G37104] [OR:Homo sapiens] [D... 352 6.0e-11 1
> [GI:1052377] [LN:HSB311WC5] [AC:Z67594] [OR:Homo sapiens]... 354 6.5e-11 1
> [GI:6124528] [LN:G59359] [AC:G59359] [OR:Homo sapiens] [D... 341 1.3e-10 1
> [GI:7161555] [LN:HSC60H06] [AC:AL158439] [OR:Homo sapiens... 338 2.6e-10 1
> [GI:308693] [LN:HUMUT887] [AC:L18457] [OR:Homo sapiens] [... 321 1.0e-09 1
> [GI:1526788] [LN:G28895] [AC:G28895] [OR:Homo sapiens] [D... 324 1.3e-09 1
> [GI:9794942] [LN:G66526] [AC:G66526] [OR:Homo sapiens] [D... 313 3.4e-09 1
> [GI:1396225] [LN:G27506] [AC:G27506] [OR:Homo sapiens] [D... 315 3.7e-09 1
> [GI:1347049] [LN:G24817] [AC:G24817] [OR:Homo sapiens] [D... 315 5.4e-09 1
> [GI:6124561] [LN:G59392] [AC:G59392] [OR:Homo sapiens] [D... 300 1.5e-08 1
> [GI:938428] [LN:G07878] [AC:G07878] [OR:Homo sapiens] [DE... 300 1.6e-08 1
>
>
> WARNING: Descriptions of 16 database sequences were not reported due to the
> limiting value of parameter V = 20.
>
>
>
> >[GI:1131572] [LN:G14809] [AC:G14809] [OR:Homo sapiens] [DE:SHGC-13583 Human
> Homo sapiens STS genomic, sequence tagged site] [KW:STS]
> [PT:Unpublished, Olivier, M., Cox, D.R. (2000)] [JO:Unpublished]
> [DB:genabnk-sts1]
> Length = 250
>
> Minus Strand HSPs:
>
> Score = 996 (155.5 bits), Expect = 7.4e-40, P = 7.4e-40
> Identities = 200/202 (99%), Positives = 200/202 (99%), Strand = Minus / Plus
>
> Query: 202 TTGCATATGGACATACAATTGTTCTAGAATCATTTGTTGAAAAGGTTGTCCATTCTCCAC 143
> |||||||||||||||||||| |||||||||||||| ||||||||||||||||||||||||
> Sbjct: 1 TTGCATATGGACATACAATTNTTCTAGAATCATTTNTTGAAAAGGTTGTCCATTCTCCAC 60
>
> ...
>
> Using the following code:
> my $blast = Bio::SearchIO->new('-format' => 'blast',
> '-file' => $file);
>
> # Now get the best hit from the blast search and check
> # to see whether its score meets the criteria for keeping.
> my $result = $blast->next_result;
>
> my $hit = $result->next_hit;
> my $name = $hit->name;
>
> print "For file $file, the best hit is $name.\n";
>
> And I get this error:
> ------------- EXCEPTION -------------
> MSG: no data for midline WARNING: HSPs involving 16 database sequences were not reported due to the
> STACK Bio::SearchIO::blast::next_result /usr/home/cstrasse/src/bioperl-1.0/Bio/SearchIO/blast.pm:486
> STACK main::check_seq ../add_blast.pl:88
> STACK toplevel ../add_blast.pl:40
>
> --------------------------------------
>
> Looks like the parser doesn't like the Warning line, but for my purposes the blast report is fine. Any suggestions?
>
> Thanks,
> Chris
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@bioperl.org
> http://bioperl.org/mailman/listinfo/bioperl-l
>
--
Jason Stajich
Duke University
jason@cgt.mc.duke.edu