[Bioperl-l] how to get the information of Strand = Plus / Plus from blastn report by bioperl.

kakchingtabam pawankumar sharma pawan.mani2 at gmail.com
Wed Jan 25 07:12:06 UTC 2012


Hi Frank,
Thanks for your kind reply. I have done this in my script even though
i could get the top hit still for every query. Still all hits are
extracted by my script. So kindly help me to solve this problem.

With best regards,
Pawan

On Tue, Jan 24, 2012 at 5:13 PM, Frank Schwach <fs5 at sanger.ac.uk> wrote:
> I think that's an option that you can set when you ask for a new BLAST
> parser:
> my $searchio = new Bio::SearchIO(
>  -format => 'blast',
>  -file   => 't/data/ecolitst.bls',
>  -best_hit_only => 1,
> );
>
> now when you use the same script that you have been using so far (loop over
> all hits), there will only be one hit per result.
>
> Frank
>
>
>
> On 24/01/12 11:31, kakchingtabam pawankumar sharma wrote:
>>
>> Hi,
>>       Thanks a lot for help Frank. It works for every Blast output.
>> One more question is that i want to best hit only(top hit of every query).
>> I show there is option called
>> $obj->best_hit_only; in Bio::SearchIO module.
>> So help to add this to my script.
>> I could not do. Its confusing.
>>
>> Thanks in Advanced.
>>
>> With best regards,
>> Pawan
>>
>>
>> On Mon, Jan 16, 2012 at 9:05 PM, Frank Schwach<fs5 at sanger.ac.uk>  wrote:
>>>
>>> Excellent, well done!
>>> No, this is the way to do it. In BioPerl modules that use strand
>>> information
>>> you will find the values +1/-1 or undef. If you want to display those as
>>> PLUS/MINUS,+/-,Watson/Crick,Laurel/Hardy whatever, you have to convert
>>> it,
>>> but you know now how to do it.
>>> You have a syntax error in your code where you retrieve the query name:
>>>
>>>
>>> my $QueryName = $result->query_name(), my $QueryDescript =
>>> $result->query_description();
>>>
>>> should be two lines and the comma should be a semicolon.
>>>
>>> Good luck!
>>>
>>> Frank
>>>
>>>
>>>
>>>
>>>
>>> On 16/01/12 15:14, kakchingtabam pawankumar sharma wrote:
>>>>
>>>>
>>>> So By using the if else conditon function, I have solve Frank.
>>>> I mean is there anyway in bioperl we can get directly using other
>>>> module! I hope u got it!
>>>>
>>>>
>>>> So my second Question have not replied that is
>>>>
>>>> i have blastn report as below:
>>>>
>>>> BLASTN 2.2.18 [Mar-02-2008]
>>>>
>>>>
>>>> Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A.
>>>> Schaffer,
>>>> Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997),
>>>> "Gapped BLAST and PSI-BLAST: a new generation of protein database search
>>>> programs",  Nucleic Acids Res. 25:3389-3402.
>>>>
>>>> Query= ORB_1210001_hsa-miR-548aa#5_1
>>>>         (24 letters)
>>>>
>>>> Database: hsa-mmu-rno_miRNA.fa
>>>>           3524 sequences; 76,424 total letters
>>>>
>>>> Searching..................................................done
>>>>
>>>>
>>>>
>>>>                                                                 Score
>>>>  E
>>>> Sequences producing significant alignments:                      (bits)
>>>> Value
>>>>
>>>> hsa-miR-548aa
>>>> 48   2e-009
>>>> hsa-miR-548d-5p
>>>> 36   9e-006
>>>> hsa-miR-548b-5p
>>>> 36   9e-006
>>>> hsa-miR-548z
>>>> 34   3e-005
>>>> hsa-miR-548q
>>>> 30   5e-004
>>>> hsa-miR-548n
>>>> 30   5e-004
>>>> hsa-miR-548ab
>>>> 28   0.002
>>>> hsa-miR-548v
>>>> 28   0.002
>>>> hsa-miR-548c-5p
>>>> 28   0.002
>>>> hsa-miR-548ag
>>>> 26   0.008
>>>> hsa-miR-548u
>>>> 26   0.008
>>>> hsa-miR-548c-3p
>>>> 26   0.008
>>>> hsa-miR-603
>>>> 26   0.008
>>>> hsa-miR-548a-3p
>>>> 26   0.008
>>>> hsa-miR-548ac
>>>> 24   0.033
>>>> hsa-miR-548an
>>>> 22   0.13
>>>> hsa-miR-548aj
>>>> 22   0.13
>>>> hsa-miR-548i
>>>> 22   0.13
>>>> hsa-miR-548g
>>>> 22   0.13
>>>> hsa-miR-548j
>>>> 22   0.13
>>>> hsa-miR-548a-5p
>>>> 22   0.13
>>>>
>>>>> hsa-miR-548aa
>>>>
>>>>
>>>>          Length = 25
>>>>
>>>>  Score = 48.1 bits (24), Expect = 2e-009
>>>>  Identities = 24/24 (100%)
>>>>  Strand = Plus / Minus
>>>>
>>>>
>>>> Query: 1  tggtgcaaaagtaattgtggtttt 24
>>>>          ||||||||||||||||||||||||
>>>> Sbjct: 25 tggtgcaaaagtaattgtggtttt 2
>>>>
>>>>
>>>>> hsa-miR-548d-5p
>>>>
>>>>
>>>>          Length = 22
>>>>
>>>>  Score = 36.2 bits (18), Expect = 9e-006
>>>>  Identities = 18/18 (100%)
>>>>  Strand = Plus / Plus
>>>>
>>>>
>>>> Query: 7  aaaagtaattgtggtttt 24
>>>>          ||||||||||||||||||
>>>> Sbjct: 1  aaaagtaattgtggtttt 18
>>>>
>>>>
>>>>
>>>> in this result i could not parse my code. i think my code does not
>>>> accept the Query header that is
>>>> "ORB_1210001_hsa-miR-548aa#5_1" as it is in the above example blast
>>>> output.
>>>>
>>>> kindly help me out.
>>>>
>>>> with regards,
>>>> Pawan.
>>>>
>>>> On 1/16/12, Frank Schwach<fs5 at sanger.ac.uk>    wrote:
>>>>>
>>>>>
>>>>> Hi Pawan ,
>>>>>
>>>>> Please always "reply to all", so that you keep the discussion on the
>>>>> bioperl mailing list and more people can help you.
>>>>> What you need is a very basic Perl command. I could give you the code
>>>>> but I think you get more out of it if you experiment with it on your
>>>>> own
>>>>> because it is very fundamental. I'll point you in the right direction:
>>>>> you want an if-then-else conditional construct.
>>>>>
>>>>> Perl's documentation about this is here:
>>>>>
>>>>> http://perldoc.perl.org/perlintro.html#Conditional-and-looping-constructs
>>>>>
>>>>> if strand is 1 you want to print "PLUS" else if it is -1 you want to
>>>>> print "MINUS", or else you might want to print "no strand" or
>>>>> something,
>>>>> or even treat it as an error and make the script abort.
>>>>>
>>>>> Give it a go and let us know if you need help. For basic (non-bio) Perl
>>>>> question, please also check out the community at
>>>>> http://www.perlmonks.org/.
>>>>>
>>>>> Hope that helps,
>>>>>
>>>>> Frank
>>>>>
>>>>>
>>>>> On 14/01/12 05:59, kakchingtabam pawankumar sharma wrote:
>>>>>>
>>>>>>
>>>>>> Hi frank,
>>>>>>
>>>>>> Thanks for your kind reply.
>>>>>> I could get the vale for query as 1 value if it is plus.
>>>>>> and for hit = -1 if it is minus.
>>>>>> But i would like to print out as PLUS or MINUS not 1 or -1 my friend.
>>>>>>
>>>>>> you can see my code as below:
>>>>>>
>>>>>> while ( my $result = $searchio->next_result() ) {
>>>>>>      my $QueryName = $result->query_name(), my $QueryDescript =
>>>>>> $result->query_description();
>>>>>>      my $QueryLength = $result->query_length;
>>>>>>      my $NoHits = $result->num_hits;
>>>>>>
>>>>>>      while( my $hit = $result->next_hit ) {
>>>>>>          my $HitName = $hit->name();
>>>>>>          my $HitDescrip = $hit->description();
>>>>>>        my $HitLength = $hit->length;
>>>>>>          my $Score = $hit->raw_score();
>>>>>>        my $Bits = $hit->bits;
>>>>>>
>>>>>>          my $hsp = $hit->next_hsp; # Only check first (= best) hsp
>>>>>>        my $Evalue =  $hsp->evalue();
>>>>>>        my $AlnLen = $hsp->num_identical();
>>>>>>        my $TotalLen = $hsp->hsp_length;
>>>>>>        my $QueryStrand = $hsp->strand('query');
>>>>>>        my $HitStrand = $hsp->strand('hit');
>>>>>>
>>>>>>        if($Evalue<     $cutoff){
>>>>>>            print "$QueryName $QueryDescript\t";
>>>>>>            print "$QueryLength\t";
>>>>>>            print "$NoHits\t";
>>>>>>            print "$HitName $HitDescrip\t";
>>>>>>            print "$HitLength\t";
>>>>>>            print "$Score\t";
>>>>>>            print "$Bits\t";
>>>>>>            print "$Evalue\t";
>>>>>>            print "$AlnLen\t";
>>>>>>            print "$TotalLen\t";
>>>>>>            print "$QueryStrand\t";
>>>>>>            print "$HitStrand\n";
>>>>>>        }
>>>>>>      }
>>>>>>      print "\n";
>>>>>> }
>>>>>>
>>>>>>
>>>>>> This is a part of my code.
>>>>>>
>>>>>> i have blastn report as below:
>>>>>>
>>>>>> BLASTN 2.2.18 [Mar-02-2008]
>>>>>>
>>>>>>
>>>>>> Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A.
>>>>>> Schaffer,
>>>>>> Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997),
>>>>>> "Gapped BLAST and PSI-BLAST: a new generation of protein database
>>>>>> search
>>>>>> programs",  Nucleic Acids Res. 25:3389-3402.
>>>>>>
>>>>>> Query= ORB_1210001_hsa-miR-548aa#5_1
>>>>>>           (24 letters)
>>>>>>
>>>>>> Database: hsa-mmu-rno_miRNA.fa
>>>>>>             3524 sequences; 76,424 total letters
>>>>>>
>>>>>> Searching..................................................done
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> Score
>>>>>> E
>>>>>> Sequences producing significant alignments:
>>>>>>  (bits)
>>>>>> Value
>>>>>>
>>>>>> hsa-miR-548aa
>>>>>> 48   2e-009
>>>>>> hsa-miR-548d-5p
>>>>>> 36   9e-006
>>>>>> hsa-miR-548b-5p
>>>>>> 36   9e-006
>>>>>> hsa-miR-548z
>>>>>> 34   3e-005
>>>>>> hsa-miR-548q
>>>>>> 30   5e-004
>>>>>> hsa-miR-548n
>>>>>> 30   5e-004
>>>>>> hsa-miR-548ab
>>>>>> 28   0.002
>>>>>> hsa-miR-548v
>>>>>> 28   0.002
>>>>>> hsa-miR-548c-5p
>>>>>> 28   0.002
>>>>>> hsa-miR-548ag
>>>>>> 26   0.008
>>>>>> hsa-miR-548u
>>>>>> 26   0.008
>>>>>> hsa-miR-548c-3p
>>>>>> 26   0.008
>>>>>> hsa-miR-603
>>>>>> 26   0.008
>>>>>> hsa-miR-548a-3p
>>>>>> 26   0.008
>>>>>> hsa-miR-548ac
>>>>>> 24   0.033
>>>>>> hsa-miR-548an
>>>>>> 22   0.13
>>>>>> hsa-miR-548aj
>>>>>> 22   0.13
>>>>>> hsa-miR-548i
>>>>>> 22   0.13
>>>>>> hsa-miR-548g
>>>>>> 22   0.13
>>>>>> hsa-miR-548j
>>>>>> 22   0.13
>>>>>> hsa-miR-548a-5p
>>>>>> 22   0.13
>>>>>>
>>>>>>> hsa-miR-548aa
>>>>>>
>>>>>>
>>>>>>            Length = 25
>>>>>>
>>>>>>   Score = 48.1 bits (24), Expect = 2e-009
>>>>>>   Identities = 24/24 (100%)
>>>>>>   Strand = Plus / Minus
>>>>>>
>>>>>>
>>>>>> Query: 1  tggtgcaaaagtaattgtggtttt 24
>>>>>>            ||||||||||||||||||||||||
>>>>>> Sbjct: 25 tggtgcaaaagtaattgtggtttt 2
>>>>>>
>>>>>>
>>>>>>> hsa-miR-548d-5p
>>>>>>
>>>>>>
>>>>>>            Length = 22
>>>>>>
>>>>>>   Score = 36.2 bits (18), Expect = 9e-006
>>>>>>   Identities = 18/18 (100%)
>>>>>>   Strand = Plus / Plus
>>>>>>
>>>>>>
>>>>>> Query: 7  aaaagtaattgtggtttt 24
>>>>>>            ||||||||||||||||||
>>>>>> Sbjct: 1  aaaagtaattgtggtttt 18
>>>>>>
>>>>>>
>>>>>>
>>>>>> in this result i could not parse my code. i think my code does not
>>>>>> accept the Query header that is
>>>>>> "ORB_1210001_hsa-miR-548aa#5_1" as it is in the above example blast
>>>>>> output.
>>>>>>
>>>>>> kindly help me out.
>>>>>>
>>>>>> with regards,
>>>>>> Pawan.
>>>>>>
>>>>>>
>>>>>> On Sat, Jan 14, 2012 at 3:13 AM, Frank Schwach<fs5 at sanger.ac.uk>
>>>>>> wrote:
>>>>>>>
>>>>>>>
>>>>>>> Hi Pawan,
>>>>>>>
>>>>>>> Can you show your code? Is it basically following the structure shown
>>>>>>> in
>>>>>>> http://www.bioperl.org/wiki/HOWTO:SearchIO#Using_SearchIO
>>>>>>> ?
>>>>>>>
>>>>>>> If that is the case
>>>>>>>
>>>>>>> $hsp->strand('query')
>>>>>>>
>>>>>>>
>>>>>>> is exactly what you need.
>>>>>>> To check if hit and query are on different strands you can do:
>>>>>>>
>>>>>>> if ( $hsp->strand('query')
>>>>>>> * $hsp->strand('hit') == -1){
>>>>>>>
>>>>>>>   # do whatever you need to do if they are on opposite strands
>>>>>>>
>>>>>>> }
>>>>>>>
>>>>>>> Hope that helps
>>>>>>>
>>>>>>> Frank
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On 13/01/12 16:46, kakchingtabam pawankumar sharma wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>> Hi,
>>>>>>>>               Using Bio::SearchIO module I am parsing the following
>>>>>>>> Blast
>>>>>>>> result.
>>>>>>>> I have used the option- $hsp->strand('query').
>>>>>>>>
>>>>>>>> But I cannot get detail of alignment.
>>>>>>>>
>>>>>>>> I need to know if my hit is forward (Strand = Plus / Plus)
>>>>>>>> or reverse ( Strand = Plus / Minus)...
>>>>>>>>   Can anyone help me to get report as Plus or Minus for query  or
>>>>>>>> hit.
>>>>>>>>
>>>>>>>> thanks in advanced.
>>>>>>>>
>>>>>>>> With regards,
>>>>>>>> Pawan
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> BLASTN 2.2.18 [Dec-23-2011]
>>>>>>>>
>>>>>>>>
>>>>>>>> Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A.
>>>>>>>> Schaffer,
>>>>>>>> Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997),
>>>>>>>> "Gapped BLAST and PSI-BLAST: a new generation of protein database
>>>>>>>> search
>>>>>>>> programs",  Nucleic Acids Res. 25:3389-3402.
>>>>>>>>
>>>>>>>> Query= 000013_c10079-9984
>>>>>>>>           (50 letters)
>>>>>>>>
>>>>>>>> Database: Cyano_Probe.fasta
>>>>>>>>             4760 sequences; 238,000 total letters
>>>>>>>>
>>>>>>>> Searching..................................................done
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Score
>>>>>>>>   E
>>>>>>>> Sequences producing significant alignments:
>>>>>>>>  (bits)
>>>>>>>> Value
>>>>>>>>
>>>>>>>> 000013_c10079-9984
>>>>>>>> 100   7e-024
>>>>>>>> 002619_2689273-2690037
>>>>>>>> 24   0.36
>>>>>>>> 001126_c1123720-1123385
>>>>>>>> 24   0.36
>>>>>>>> 003211_c3326737-3326480
>>>>>>>> 22   1.4
>>>>>>>> 002415_2471082-2471420
>>>>>>>> 22   1.4
>>>>>>>> 002269_2321276-2322463
>>>>>>>> 22   1.4
>>>>>>>> 001328_c1326535-1326164
>>>>>>>> 22   1.4
>>>>>>>>
>>>>>>>>> 000013_c10079-9984
>>>>>>>>
>>>>>>>>
>>>>>>>>            Length = 50
>>>>>>>>
>>>>>>>>   Score = 99.6 bits (50), Expect = 7e-024
>>>>>>>>   Identities = 50/50 (100%)
>>>>>>>>   Strand = Plus / Plus
>>>>>>>>
>>>>>>>>
>>>>>>>> Query: 1  agtcaacaccaatctgagtttaatcactatcttgatcatgttagatatca 50
>>>>>>>>            ||||||||||||||||||||||||||||||||||||||||||||||||||
>>>>>>>> Sbjct: 1  agtcaacaccaatctgagtttaatcactatcttgatcatgttagatatca 50
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> Bioperl-l mailing list
>>>>>>>> Bioperl-l at lists.open-bio.org
>>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> The Wellcome Trust Sanger Institute is operated by Genome Research
>>>>>>> Limited,
>>>>>>> a charity registered in England with number 1021457 and a company
>>>>>>> registered
>>>>>>> in England with number 2742969, whose registered office is 215 Euston
>>>>>>> Road,
>>>>>>> London, NW1 2BE.
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>>  The Wellcome Trust Sanger Institute is operated by Genome Research
>>>>>  Limited, a charity registered in England with number 1021457 and a
>>>>>  company registered in England with number 2742969, whose registered
>>>>>  office is 215 Euston Road, London, NW1 2BE.
>>>>>
>>>
>>>
>>> --
>>> The Wellcome Trust Sanger Institute is operated by Genome Research
>>> Limited,
>>> a charity registered in England with number 1021457 and a company
>>> registered
>>> in England with number 2742969, whose registered office is 215 Euston
>>> Road,
>>> London, NW1 2BE.
>
>
>
> --
> The Wellcome Trust Sanger Institute is operated by Genome Research Limited,
> a charity registered in England with number 1021457 and a company registered
> in England with number 2742969, whose registered office is 215 Euston Road,
> London, NW1 2BE.




More information about the Bioperl-l mailing list