[Bioperl-l] Blast parsing question
Chris Strassel
CStrassel@genomecorp.com
Thu, 25 Apr 2002 08:40:27 -0400
Not sure if this is a known problem or not (I've been out of touch for a while).
I'm trying to parse the following blast output:
... Sum
High Probability
Sequences producing High-scoring Segment Pairs: Score P(N) N
[GI:1131572] [LN:G14809] [AC:G14809] [OR:Homo sapiens] [D... 996 7.4e-40 1
[GI:9392631] [LN:AF257078] [AC:AF257078] [OR:Homo sapiens... 462 2.2e-15 1
[GI:2687802] [LN:HS179N16T] [AC:AL020972] [OR:Homo sapien... 427 4.4e-14 1
[GI:605020] [LN:HUMUT6640] [AC:L30574] [OR:Homo sapiens] ... 396 3.4e-13 1
[GI:3168695] [LN:G38121] [AC:G38121] [OR:Homo sapiens] [D... 388 8.7e-13 1
[GI:605053] [LN:HUMUT7241] [AC:L30409] [OR:Homo sapiens] ... 379 2.4e-12 1
[GI:2734402] [LN:G36735] [AC:G36735] [OR:Homo sapiens] [D... 377 3.4e-12 1
[GI:1113732] [LN:HUMSWS3328] [AC:G13119] [OR:Homo sapiens... 366 1.4e-11 1
[GI:12025508] [LN:G67450] [AC:G67450] [OR:Homo sapiens] [... 362 1.9e-11 1
[GI:2996755] [LN:G37104] [AC:G37104] [OR:Homo sapiens] [D... 352 6.0e-11 1
[GI:1052377] [LN:HSB311WC5] [AC:Z67594] [OR:Homo sapiens]... 354 6.5e-11 1
[GI:6124528] [LN:G59359] [AC:G59359] [OR:Homo sapiens] [D... 341 1.3e-10 1
[GI:7161555] [LN:HSC60H06] [AC:AL158439] [OR:Homo sapiens... 338 2.6e-10 1
[GI:308693] [LN:HUMUT887] [AC:L18457] [OR:Homo sapiens] [... 321 1.0e-09 1
[GI:1526788] [LN:G28895] [AC:G28895] [OR:Homo sapiens] [D... 324 1.3e-09 1
[GI:9794942] [LN:G66526] [AC:G66526] [OR:Homo sapiens] [D... 313 3.4e-09 1
[GI:1396225] [LN:G27506] [AC:G27506] [OR:Homo sapiens] [D... 315 3.7e-09 1
[GI:1347049] [LN:G24817] [AC:G24817] [OR:Homo sapiens] [D... 315 5.4e-09 1
[GI:6124561] [LN:G59392] [AC:G59392] [OR:Homo sapiens] [D... 300 1.5e-08 1
[GI:938428] [LN:G07878] [AC:G07878] [OR:Homo sapiens] [DE... 300 1.6e-08 1
WARNING: Descriptions of 16 database sequences were not reported due to the
limiting value of parameter V = 20.
>[GI:1131572] [LN:G14809] [AC:G14809] [OR:Homo sapiens] [DE:SHGC-13583 Human
Homo sapiens STS genomic, sequence tagged site] [KW:STS]
[PT:Unpublished, Olivier, M., Cox, D.R. (2000)] [JO:Unpublished]
[DB:genabnk-sts1]
Length = 250
Minus Strand HSPs:
Score = 996 (155.5 bits), Expect = 7.4e-40, P = 7.4e-40
Identities = 200/202 (99%), Positives = 200/202 (99%), Strand = Minus / Plus
Query: 202 TTGCATATGGACATACAATTGTTCTAGAATCATTTGTTGAAAAGGTTGTCCATTCTCCAC 143
|||||||||||||||||||| |||||||||||||| ||||||||||||||||||||||||
Sbjct: 1 TTGCATATGGACATACAATTNTTCTAGAATCATTTNTTGAAAAGGTTGTCCATTCTCCAC 60
...
Using the following code:
my $blast = Bio::SearchIO->new('-format' => 'blast',
'-file' => $file);
# Now get the best hit from the blast search and check
# to see whether its score meets the criteria for keeping.
my $result = $blast->next_result;
my $hit = $result->next_hit;
my $name = $hit->name;
print "For file $file, the best hit is $name.\n";
And I get this error:
------------- EXCEPTION -------------
MSG: no data for midline WARNING: HSPs involving 16 database sequences were not reported due to the
STACK Bio::SearchIO::blast::next_result /usr/home/cstrasse/src/bioperl-1.0/Bio/SearchIO/blast.pm:486
STACK main::check_seq ../add_blast.pl:88
STACK toplevel ../add_blast.pl:40
--------------------------------------
Looks like the parser doesn't like the Warning line, but for my purposes the blast report is fine. Any suggestions?
Thanks,
Chris