[Bioperl-l] Parsing strand and frame for blastx results using SearchIO
Roy Chaudhuri
roy.chaudhuri at gmail.com
Wed Jul 23 16:13:41 UTC 2008
There's an explanation in the BioPerl FAQ:
http://www.bioperl.org/wiki/FAQ#How_do_I_get_the_frame_for_a_translated_search.3F
The problem with the strand being reported as 0 is because you are
asking for the strand of the hit (protein) rather than the query (DNA).
Roy.
--
Dr. Roy Chaudhuri
Department of Veterinary Medicine
University of Cambridge, U.K.
michael watson (IAH-C) wrote:
> Hello
>
> Am currently using bioperl-1.5.2_102 and BLASTX 2.2.17 [Aug-26-2007],
> but this also holds true in 1.5.1 and 1.4.0.
>
> I have a results file that is pasted below, and a script that is also
> pasted below - just wondering how searchIO thinks the strand is 0 and
> the frame is 2?
>
> Result:
> BLASTX 2.2.17 [Aug-26-2007]
>
>
> Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A.
> Schaffer,
> Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997),
> "Gapped BLAST and PSI-BLAST: a new generation of protein database search
> programs", Nucleic Acids Res. 25:3389-3402.
>
> Query= lcl|chr2 42223489 - 44537930
> (2,314,442 letters)
>
> Database: ../chemokine_receptors_aa.txt
> 15 sequences; 5543 total letters
>
> Searching..................................................done
>
>
>
> Score
> E
> Sequences producing significant alignments: (bits)
> Value
>
> chCCRc
> 727 0.0
>
>> chCCRc
> Length = 377
>
> Score = 727 bits (1877), Expect = 0.0
> Identities = 362/377 (96%), Positives = 362/377 (96%)
> Frame = -3
>
> Query: 496563
> MEQRKKLTGRHTRALYLWFFPSQESKMNPTDLFLSTTEYDYGYDENTAPCNEGNSFPRFK 496384
>
> MEQRKKLTGRHTRALYLWFFPSQESKMNPTDLFLSTTEYDYGYDENTAPCNEGNSFPRFK
> Sbjct: 1
> MEQRKKLTGRHTRALYLWFFPSQESKMNPTDLFLSTTEYDYGYDENTAPCNEGNSFPRFK 60
>
> Query: 496383
> SLFLPILYCLVFVFCLLGNSLVLWILLTRKRLMTMTDICLLNLAASDLLFIVPLPFQAYY 496204
>
> SLFLPILYCLVFVFCLLGNSLVLWILLTRKRLMTMTDICLLNLAASDLLFIVPLPFQAYY
> Sbjct: 61
> SLFLPILYCLVFVFCLLGNSLVLWILLTRKRLMTMTDICLLNLAASDLLFIVPLPFQAYY 120
>
> Query: 496203
> ASDQWVFGNALCKIMGGIYYTGFYSSIFFITLMSIDRYIAIVHAVYAMKIRTASCGTMIS 496024
>
> ASDQWVFGNALCKIMGGIYYTGFYSSIFFITLMSIDRYIAIVHAVYAMKIRTASCGTMIS
> Sbjct: 121
> ASDQWVFGNALCKIMGGIYYTGFYSSIFFITLMSIDRYIAIVHAVYAMKIRTASCGTMIS 180
>
> Query: 496023
> LVLWLVAGLASVPNIVFNQQLEIEQSVQCVPVYPPGNNIWKVTTQFAANILGLLIPFSIL 495844
>
> LVLWLVAGLASVPNIVFNQQLEIEQSVQCVPVYPPGNNIWKVTTQFAANILGLLIPFSIL
> Sbjct: 181
> LVLWLVAGLASVPNIVFNQQLEIEQSVQCVPVYPPGNNIWKVTTQFAANILGLLIPFSIL 240
>
> Query: 495843
> IHCYAQILRNLRKCKNQNXXXXXXXXXXXXXXXFLFWTPFNVVLFLDSLQSLLIIDNCQA 495664
> IHCYAQILRNLRKCKNQN
> FLFWTPFNVVLFLDSLQSLLIIDNCQA
> Sbjct: 241
> IHCYAQILRNLRKCKNQNKIKAIKMIFIIVIVFFLFWTPFNVVLFLDSLQSLLIIDNCQA 300
>
> Query: 495663
> SSQITLALQLTETISFIHCCLNPVIYAFAGVTFKAHLKRLLQPCARILWSPTRGSGVTQS 495484
>
> SSQITLALQLTETISFIHCCLNPVIYAFAGVTFKAHLKRLLQPCARILWSPTRGSGVTQS
> Sbjct: 301
> SSQITLALQLTETISFIHCCLNPVIYAFAGVTFKAHLKRLLQPCARILWSPTRGSGVTQS 360
>
> Query: 495483 SLVLSQISGCSDSAGVL 495433
> SLVLSQISGCSDSAGVL
> Sbjct: 361 SLVLSQISGCSDSAGVL 377
>
>
> Database: ../chemokine_receptors_aa.txt
> Posted date: Jul 23, 2008 2:43 PM
> Number of letters in database: 5543
> Number of sequences in database: 15
>
> Lambda K H
> 0.318 0.134 0.401
>
> Gapped
> Lambda K H
> 0.267 0.0410 0.140
>
>
> Matrix: BLOSUM62
> Gap Penalties: Existence: 11, Extension: 1
> Number of Sequences: 15
> Number of Hits to DB: 28,029,184
> Number of extensions: 721812
> Number of successful extensions: 2651
> Number of sequences better than 1.0e-05: 15
> Number of HSP's gapped: 2358
> Number of HSP's successfully gapped: 148
> Length of query: 771480
> Length of database: 5543
> Length adjustment: 102
> Effective length of query: 771378
> Effective length of database: 4013
> Effective search space: 3095539914
> Effective search space used: 3095539914
> Neighboring words threshold: 12
> Window for multiple hits: 40
> X1: 16 ( 7.3 bits)
> X2: 38 (14.6 bits)
> X3: 64 (24.7 bits)
> S1: 41 (21.7 bits)
> S2: 33 (17.3 bits)
>
>
> Script:
> #!/usr/bin/perl
>
> use lib "/usr/local/bioperl-1.5.2_102";
> use Bio::SearchIO;
>
> my $searchio = Bio::SearchIO->new(-format => 'blast', -file =>
> "test.blast");
>
> while (my($result) = $searchio->next_result()) {
> unless (defined $result) {
> # there is an undefined result at the end of SearchIO
> objects
> last;
> }
>
> while(my $hit = $result->next_hit) {
> while (my $hsp = $hit->next_hsp) {
>
> print $hit->name, "\t";
> print $hsp->strand('hit'), "\t";
> print $hsp->frame, "\n";
> }
> }
> }
>
> Output:
> -bash-3.00$ perl parse_test_blast.pl
> chCCRc 0 2
>
>
> Head of Informatics
> Institute for Animal Health
> Compton
> Berks
> RG20 7NN
> 01635 578411
>
> http://www.iah.ac.uk/research/bioinformatics/bioinf.shtml
>
> The information contained in this message may be confidential or legally
> privileged and is intended solely for the addressee.
> If you have received this message in error please delete it & notify the
> originator immediately.
> Unauthorised use, disclosure, copying or alteration of this message is
> forbidden & may be unlawful.
> The contents of this e-mail are the views of the sender and do not
> necessarily represent the views of the Institute.
> This email and associated attachments has been checked locally for
> viruses but we can accept no responsibility once it has left our
> systems.
> Communications on Institute computers are monitored to secure the
> effective operation of the systems and for other lawful purposes.
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
More information about the Bioperl-l
mailing list