[Bioperl-l] Parsing strand and frame for blastx results using SearchIO
michael watson (IAH-C)
michael.watson at bbsrc.ac.uk
Wed Jul 23 15:49:21 UTC 2008
Hello
Am currently using bioperl-1.5.2_102 and BLASTX 2.2.17 [Aug-26-2007],
but this also holds true in 1.5.1 and 1.4.0.
I have a results file that is pasted below, and a script that is also
pasted below - just wondering how searchIO thinks the strand is 0 and
the frame is 2?
Result:
BLASTX 2.2.17 [Aug-26-2007]
Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A.
Schaffer,
Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997),
"Gapped BLAST and PSI-BLAST: a new generation of protein database search
programs", Nucleic Acids Res. 25:3389-3402.
Query= lcl|chr2 42223489 - 44537930
(2,314,442 letters)
Database: ../chemokine_receptors_aa.txt
15 sequences; 5543 total letters
Searching..................................................done
Score
E
Sequences producing significant alignments: (bits)
Value
chCCRc
727 0.0
>chCCRc
Length = 377
Score = 727 bits (1877), Expect = 0.0
Identities = 362/377 (96%), Positives = 362/377 (96%)
Frame = -3
Query: 496563
MEQRKKLTGRHTRALYLWFFPSQESKMNPTDLFLSTTEYDYGYDENTAPCNEGNSFPRFK 496384
MEQRKKLTGRHTRALYLWFFPSQESKMNPTDLFLSTTEYDYGYDENTAPCNEGNSFPRFK
Sbjct: 1
MEQRKKLTGRHTRALYLWFFPSQESKMNPTDLFLSTTEYDYGYDENTAPCNEGNSFPRFK 60
Query: 496383
SLFLPILYCLVFVFCLLGNSLVLWILLTRKRLMTMTDICLLNLAASDLLFIVPLPFQAYY 496204
SLFLPILYCLVFVFCLLGNSLVLWILLTRKRLMTMTDICLLNLAASDLLFIVPLPFQAYY
Sbjct: 61
SLFLPILYCLVFVFCLLGNSLVLWILLTRKRLMTMTDICLLNLAASDLLFIVPLPFQAYY 120
Query: 496203
ASDQWVFGNALCKIMGGIYYTGFYSSIFFITLMSIDRYIAIVHAVYAMKIRTASCGTMIS 496024
ASDQWVFGNALCKIMGGIYYTGFYSSIFFITLMSIDRYIAIVHAVYAMKIRTASCGTMIS
Sbjct: 121
ASDQWVFGNALCKIMGGIYYTGFYSSIFFITLMSIDRYIAIVHAVYAMKIRTASCGTMIS 180
Query: 496023
LVLWLVAGLASVPNIVFNQQLEIEQSVQCVPVYPPGNNIWKVTTQFAANILGLLIPFSIL 495844
LVLWLVAGLASVPNIVFNQQLEIEQSVQCVPVYPPGNNIWKVTTQFAANILGLLIPFSIL
Sbjct: 181
LVLWLVAGLASVPNIVFNQQLEIEQSVQCVPVYPPGNNIWKVTTQFAANILGLLIPFSIL 240
Query: 495843
IHCYAQILRNLRKCKNQNXXXXXXXXXXXXXXXFLFWTPFNVVLFLDSLQSLLIIDNCQA 495664
IHCYAQILRNLRKCKNQN
FLFWTPFNVVLFLDSLQSLLIIDNCQA
Sbjct: 241
IHCYAQILRNLRKCKNQNKIKAIKMIFIIVIVFFLFWTPFNVVLFLDSLQSLLIIDNCQA 300
Query: 495663
SSQITLALQLTETISFIHCCLNPVIYAFAGVTFKAHLKRLLQPCARILWSPTRGSGVTQS 495484
SSQITLALQLTETISFIHCCLNPVIYAFAGVTFKAHLKRLLQPCARILWSPTRGSGVTQS
Sbjct: 301
SSQITLALQLTETISFIHCCLNPVIYAFAGVTFKAHLKRLLQPCARILWSPTRGSGVTQS 360
Query: 495483 SLVLSQISGCSDSAGVL 495433
SLVLSQISGCSDSAGVL
Sbjct: 361 SLVLSQISGCSDSAGVL 377
Database: ../chemokine_receptors_aa.txt
Posted date: Jul 23, 2008 2:43 PM
Number of letters in database: 5543
Number of sequences in database: 15
Lambda K H
0.318 0.134 0.401
Gapped
Lambda K H
0.267 0.0410 0.140
Matrix: BLOSUM62
Gap Penalties: Existence: 11, Extension: 1
Number of Sequences: 15
Number of Hits to DB: 28,029,184
Number of extensions: 721812
Number of successful extensions: 2651
Number of sequences better than 1.0e-05: 15
Number of HSP's gapped: 2358
Number of HSP's successfully gapped: 148
Length of query: 771480
Length of database: 5543
Length adjustment: 102
Effective length of query: 771378
Effective length of database: 4013
Effective search space: 3095539914
Effective search space used: 3095539914
Neighboring words threshold: 12
Window for multiple hits: 40
X1: 16 ( 7.3 bits)
X2: 38 (14.6 bits)
X3: 64 (24.7 bits)
S1: 41 (21.7 bits)
S2: 33 (17.3 bits)
Script:
#!/usr/bin/perl
use lib "/usr/local/bioperl-1.5.2_102";
use Bio::SearchIO;
my $searchio = Bio::SearchIO->new(-format => 'blast', -file =>
"test.blast");
while (my($result) = $searchio->next_result()) {
unless (defined $result) {
# there is an undefined result at the end of SearchIO
objects
last;
}
while(my $hit = $result->next_hit) {
while (my $hsp = $hit->next_hsp) {
print $hit->name, "\t";
print $hsp->strand('hit'), "\t";
print $hsp->frame, "\n";
}
}
}
Output:
-bash-3.00$ perl parse_test_blast.pl
chCCRc 0 2
Head of Informatics
Institute for Animal Health
Compton
Berks
RG20 7NN
01635 578411
http://www.iah.ac.uk/research/bioinformatics/bioinf.shtml
The information contained in this message may be confidential or legally
privileged and is intended solely for the addressee.
If you have received this message in error please delete it & notify the
originator immediately.
Unauthorised use, disclosure, copying or alteration of this message is
forbidden & may be unlawful.
The contents of this e-mail are the views of the sender and do not
necessarily represent the views of the Institute.
This email and associated attachments has been checked locally for
viruses but we can accept no responsibility once it has left our
systems.
Communications on Institute computers are monitored to secure the
effective operation of the systems and for other lawful purposes.
More information about the Bioperl-l
mailing list