[Bioperl-l] how to get the information of Strand = Plus / Plus from blastn report by bioperl.
Frank Schwach
fs5 at sanger.ac.uk
Wed Jan 25 09:37:51 UTC 2012
can you post your script please?
On 25/01/12 07:12, kakchingtabam pawankumar sharma wrote:
> Hi Frank,
> Thanks for your kind reply. I have done this in my script even though
> i could get the top hit still for every query. Still all hits are
> extracted by my script. So kindly help me to solve this problem.
>
> With best regards,
> Pawan
>
> On Tue, Jan 24, 2012 at 5:13 PM, Frank Schwach<fs5 at sanger.ac.uk> wrote:
>> I think that's an option that you can set when you ask for a new BLAST
>> parser:
>> my $searchio = new Bio::SearchIO(
>> -format => 'blast',
>> -file => 't/data/ecolitst.bls',
>> -best_hit_only => 1,
>> );
>>
>> now when you use the same script that you have been using so far (loop over
>> all hits), there will only be one hit per result.
>>
>> Frank
>>
>>
>>
>> On 24/01/12 11:31, kakchingtabam pawankumar sharma wrote:
>>>
>>> Hi,
>>> Thanks a lot for help Frank. It works for every Blast output.
>>> One more question is that i want to best hit only(top hit of every query).
>>> I show there is option called
>>> $obj->best_hit_only; in Bio::SearchIO module.
>>> So help to add this to my script.
>>> I could not do. Its confusing.
>>>
>>> Thanks in Advanced.
>>>
>>> With best regards,
>>> Pawan
>>>
>>>
>>> On Mon, Jan 16, 2012 at 9:05 PM, Frank Schwach<fs5 at sanger.ac.uk> wrote:
>>>>
>>>> Excellent, well done!
>>>> No, this is the way to do it. In BioPerl modules that use strand
>>>> information
>>>> you will find the values +1/-1 or undef. If you want to display those as
>>>> PLUS/MINUS,+/-,Watson/Crick,Laurel/Hardy whatever, you have to convert
>>>> it,
>>>> but you know now how to do it.
>>>> You have a syntax error in your code where you retrieve the query name:
>>>>
>>>>
>>>> my $QueryName = $result->query_name(), my $QueryDescript =
>>>> $result->query_description();
>>>>
>>>> should be two lines and the comma should be a semicolon.
>>>>
>>>> Good luck!
>>>>
>>>> Frank
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On 16/01/12 15:14, kakchingtabam pawankumar sharma wrote:
>>>>>
>>>>>
>>>>> So By using the if else conditon function, I have solve Frank.
>>>>> I mean is there anyway in bioperl we can get directly using other
>>>>> module! I hope u got it!
>>>>>
>>>>>
>>>>> So my second Question have not replied that is
>>>>>
>>>>> i have blastn report as below:
>>>>>
>>>>> BLASTN 2.2.18 [Mar-02-2008]
>>>>>
>>>>>
>>>>> Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A.
>>>>> Schaffer,
>>>>> Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997),
>>>>> "Gapped BLAST and PSI-BLAST: a new generation of protein database search
>>>>> programs", Nucleic Acids Res. 25:3389-3402.
>>>>>
>>>>> Query= ORB_1210001_hsa-miR-548aa#5_1
>>>>> (24 letters)
>>>>>
>>>>> Database: hsa-mmu-rno_miRNA.fa
>>>>> 3524 sequences; 76,424 total letters
>>>>>
>>>>> Searching..................................................done
>>>>>
>>>>>
>>>>>
>>>>> Score
>>>>> E
>>>>> Sequences producing significant alignments: (bits)
>>>>> Value
>>>>>
>>>>> hsa-miR-548aa
>>>>> 48 2e-009
>>>>> hsa-miR-548d-5p
>>>>> 36 9e-006
>>>>> hsa-miR-548b-5p
>>>>> 36 9e-006
>>>>> hsa-miR-548z
>>>>> 34 3e-005
>>>>> hsa-miR-548q
>>>>> 30 5e-004
>>>>> hsa-miR-548n
>>>>> 30 5e-004
>>>>> hsa-miR-548ab
>>>>> 28 0.002
>>>>> hsa-miR-548v
>>>>> 28 0.002
>>>>> hsa-miR-548c-5p
>>>>> 28 0.002
>>>>> hsa-miR-548ag
>>>>> 26 0.008
>>>>> hsa-miR-548u
>>>>> 26 0.008
>>>>> hsa-miR-548c-3p
>>>>> 26 0.008
>>>>> hsa-miR-603
>>>>> 26 0.008
>>>>> hsa-miR-548a-3p
>>>>> 26 0.008
>>>>> hsa-miR-548ac
>>>>> 24 0.033
>>>>> hsa-miR-548an
>>>>> 22 0.13
>>>>> hsa-miR-548aj
>>>>> 22 0.13
>>>>> hsa-miR-548i
>>>>> 22 0.13
>>>>> hsa-miR-548g
>>>>> 22 0.13
>>>>> hsa-miR-548j
>>>>> 22 0.13
>>>>> hsa-miR-548a-5p
>>>>> 22 0.13
>>>>>
>>>>>> hsa-miR-548aa
>>>>>
>>>>>
>>>>> Length = 25
>>>>>
>>>>> Score = 48.1 bits (24), Expect = 2e-009
>>>>> Identities = 24/24 (100%)
>>>>> Strand = Plus / Minus
>>>>>
>>>>>
>>>>> Query: 1 tggtgcaaaagtaattgtggtttt 24
>>>>> ||||||||||||||||||||||||
>>>>> Sbjct: 25 tggtgcaaaagtaattgtggtttt 2
>>>>>
>>>>>
>>>>>> hsa-miR-548d-5p
>>>>>
>>>>>
>>>>> Length = 22
>>>>>
>>>>> Score = 36.2 bits (18), Expect = 9e-006
>>>>> Identities = 18/18 (100%)
>>>>> Strand = Plus / Plus
>>>>>
>>>>>
>>>>> Query: 7 aaaagtaattgtggtttt 24
>>>>> ||||||||||||||||||
>>>>> Sbjct: 1 aaaagtaattgtggtttt 18
>>>>>
>>>>>
>>>>>
>>>>> in this result i could not parse my code. i think my code does not
>>>>> accept the Query header that is
>>>>> "ORB_1210001_hsa-miR-548aa#5_1" as it is in the above example blast
>>>>> output.
>>>>>
>>>>> kindly help me out.
>>>>>
>>>>> with regards,
>>>>> Pawan.
>>>>>
>>>>> On 1/16/12, Frank Schwach<fs5 at sanger.ac.uk> wrote:
>>>>>>
>>>>>>
>>>>>> Hi Pawan ,
>>>>>>
>>>>>> Please always "reply to all", so that you keep the discussion on the
>>>>>> bioperl mailing list and more people can help you.
>>>>>> What you need is a very basic Perl command. I could give you the code
>>>>>> but I think you get more out of it if you experiment with it on your
>>>>>> own
>>>>>> because it is very fundamental. I'll point you in the right direction:
>>>>>> you want an if-then-else conditional construct.
>>>>>>
>>>>>> Perl's documentation about this is here:
>>>>>>
>>>>>> http://perldoc.perl.org/perlintro.html#Conditional-and-looping-constructs
>>>>>>
>>>>>> if strand is 1 you want to print "PLUS" else if it is -1 you want to
>>>>>> print "MINUS", or else you might want to print "no strand" or
>>>>>> something,
>>>>>> or even treat it as an error and make the script abort.
>>>>>>
>>>>>> Give it a go and let us know if you need help. For basic (non-bio) Perl
>>>>>> question, please also check out the community at
>>>>>> http://www.perlmonks.org/.
>>>>>>
>>>>>> Hope that helps,
>>>>>>
>>>>>> Frank
>>>>>>
>>>>>>
>>>>>> On 14/01/12 05:59, kakchingtabam pawankumar sharma wrote:
>>>>>>>
>>>>>>>
>>>>>>> Hi frank,
>>>>>>>
>>>>>>> Thanks for your kind reply.
>>>>>>> I could get the vale for query as 1 value if it is plus.
>>>>>>> and for hit = -1 if it is minus.
>>>>>>> But i would like to print out as PLUS or MINUS not 1 or -1 my friend.
>>>>>>>
>>>>>>> you can see my code as below:
>>>>>>>
>>>>>>> while ( my $result = $searchio->next_result() ) {
>>>>>>> my $QueryName = $result->query_name(), my $QueryDescript =
>>>>>>> $result->query_description();
>>>>>>> my $QueryLength = $result->query_length;
>>>>>>> my $NoHits = $result->num_hits;
>>>>>>>
>>>>>>> while( my $hit = $result->next_hit ) {
>>>>>>> my $HitName = $hit->name();
>>>>>>> my $HitDescrip = $hit->description();
>>>>>>> my $HitLength = $hit->length;
>>>>>>> my $Score = $hit->raw_score();
>>>>>>> my $Bits = $hit->bits;
>>>>>>>
>>>>>>> my $hsp = $hit->next_hsp; # Only check first (= best) hsp
>>>>>>> my $Evalue = $hsp->evalue();
>>>>>>> my $AlnLen = $hsp->num_identical();
>>>>>>> my $TotalLen = $hsp->hsp_length;
>>>>>>> my $QueryStrand = $hsp->strand('query');
>>>>>>> my $HitStrand = $hsp->strand('hit');
>>>>>>>
>>>>>>> if($Evalue< $cutoff){
>>>>>>> print "$QueryName $QueryDescript\t";
>>>>>>> print "$QueryLength\t";
>>>>>>> print "$NoHits\t";
>>>>>>> print "$HitName $HitDescrip\t";
>>>>>>> print "$HitLength\t";
>>>>>>> print "$Score\t";
>>>>>>> print "$Bits\t";
>>>>>>> print "$Evalue\t";
>>>>>>> print "$AlnLen\t";
>>>>>>> print "$TotalLen\t";
>>>>>>> print "$QueryStrand\t";
>>>>>>> print "$HitStrand\n";
>>>>>>> }
>>>>>>> }
>>>>>>> print "\n";
>>>>>>> }
>>>>>>>
>>>>>>>
>>>>>>> This is a part of my code.
>>>>>>>
>>>>>>> i have blastn report as below:
>>>>>>>
>>>>>>> BLASTN 2.2.18 [Mar-02-2008]
>>>>>>>
>>>>>>>
>>>>>>> Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A.
>>>>>>> Schaffer,
>>>>>>> Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997),
>>>>>>> "Gapped BLAST and PSI-BLAST: a new generation of protein database
>>>>>>> search
>>>>>>> programs", Nucleic Acids Res. 25:3389-3402.
>>>>>>>
>>>>>>> Query= ORB_1210001_hsa-miR-548aa#5_1
>>>>>>> (24 letters)
>>>>>>>
>>>>>>> Database: hsa-mmu-rno_miRNA.fa
>>>>>>> 3524 sequences; 76,424 total letters
>>>>>>>
>>>>>>> Searching..................................................done
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Score
>>>>>>> E
>>>>>>> Sequences producing significant alignments:
>>>>>>> (bits)
>>>>>>> Value
>>>>>>>
>>>>>>> hsa-miR-548aa
>>>>>>> 48 2e-009
>>>>>>> hsa-miR-548d-5p
>>>>>>> 36 9e-006
>>>>>>> hsa-miR-548b-5p
>>>>>>> 36 9e-006
>>>>>>> hsa-miR-548z
>>>>>>> 34 3e-005
>>>>>>> hsa-miR-548q
>>>>>>> 30 5e-004
>>>>>>> hsa-miR-548n
>>>>>>> 30 5e-004
>>>>>>> hsa-miR-548ab
>>>>>>> 28 0.002
>>>>>>> hsa-miR-548v
>>>>>>> 28 0.002
>>>>>>> hsa-miR-548c-5p
>>>>>>> 28 0.002
>>>>>>> hsa-miR-548ag
>>>>>>> 26 0.008
>>>>>>> hsa-miR-548u
>>>>>>> 26 0.008
>>>>>>> hsa-miR-548c-3p
>>>>>>> 26 0.008
>>>>>>> hsa-miR-603
>>>>>>> 26 0.008
>>>>>>> hsa-miR-548a-3p
>>>>>>> 26 0.008
>>>>>>> hsa-miR-548ac
>>>>>>> 24 0.033
>>>>>>> hsa-miR-548an
>>>>>>> 22 0.13
>>>>>>> hsa-miR-548aj
>>>>>>> 22 0.13
>>>>>>> hsa-miR-548i
>>>>>>> 22 0.13
>>>>>>> hsa-miR-548g
>>>>>>> 22 0.13
>>>>>>> hsa-miR-548j
>>>>>>> 22 0.13
>>>>>>> hsa-miR-548a-5p
>>>>>>> 22 0.13
>>>>>>>
>>>>>>>> hsa-miR-548aa
>>>>>>>
>>>>>>>
>>>>>>> Length = 25
>>>>>>>
>>>>>>> Score = 48.1 bits (24), Expect = 2e-009
>>>>>>> Identities = 24/24 (100%)
>>>>>>> Strand = Plus / Minus
>>>>>>>
>>>>>>>
>>>>>>> Query: 1 tggtgcaaaagtaattgtggtttt 24
>>>>>>> ||||||||||||||||||||||||
>>>>>>> Sbjct: 25 tggtgcaaaagtaattgtggtttt 2
>>>>>>>
>>>>>>>
>>>>>>>> hsa-miR-548d-5p
>>>>>>>
>>>>>>>
>>>>>>> Length = 22
>>>>>>>
>>>>>>> Score = 36.2 bits (18), Expect = 9e-006
>>>>>>> Identities = 18/18 (100%)
>>>>>>> Strand = Plus / Plus
>>>>>>>
>>>>>>>
>>>>>>> Query: 7 aaaagtaattgtggtttt 24
>>>>>>> ||||||||||||||||||
>>>>>>> Sbjct: 1 aaaagtaattgtggtttt 18
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> in this result i could not parse my code. i think my code does not
>>>>>>> accept the Query header that is
>>>>>>> "ORB_1210001_hsa-miR-548aa#5_1" as it is in the above example blast
>>>>>>> output.
>>>>>>>
>>>>>>> kindly help me out.
>>>>>>>
>>>>>>> with regards,
>>>>>>> Pawan.
>>>>>>>
>>>>>>>
>>>>>>> On Sat, Jan 14, 2012 at 3:13 AM, Frank Schwach<fs5 at sanger.ac.uk>
>>>>>>> wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>> Hi Pawan,
>>>>>>>>
>>>>>>>> Can you show your code? Is it basically following the structure shown
>>>>>>>> in
>>>>>>>> http://www.bioperl.org/wiki/HOWTO:SearchIO#Using_SearchIO
>>>>>>>> ?
>>>>>>>>
>>>>>>>> If that is the case
>>>>>>>>
>>>>>>>> $hsp->strand('query')
>>>>>>>>
>>>>>>>>
>>>>>>>> is exactly what you need.
>>>>>>>> To check if hit and query are on different strands you can do:
>>>>>>>>
>>>>>>>> if ( $hsp->strand('query')
>>>>>>>> * $hsp->strand('hit') == -1){
>>>>>>>>
>>>>>>>> # do whatever you need to do if they are on opposite strands
>>>>>>>>
>>>>>>>> }
>>>>>>>>
>>>>>>>> Hope that helps
>>>>>>>>
>>>>>>>> Frank
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On 13/01/12 16:46, kakchingtabam pawankumar sharma wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Hi,
>>>>>>>>> Using Bio::SearchIO module I am parsing the following
>>>>>>>>> Blast
>>>>>>>>> result.
>>>>>>>>> I have used the option- $hsp->strand('query').
>>>>>>>>>
>>>>>>>>> But I cannot get detail of alignment.
>>>>>>>>>
>>>>>>>>> I need to know if my hit is forward (Strand = Plus / Plus)
>>>>>>>>> or reverse ( Strand = Plus / Minus)...
>>>>>>>>> Can anyone help me to get report as Plus or Minus for query or
>>>>>>>>> hit.
>>>>>>>>>
>>>>>>>>> thanks in advanced.
>>>>>>>>>
>>>>>>>>> With regards,
>>>>>>>>> Pawan
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> BLASTN 2.2.18 [Dec-23-2011]
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A.
>>>>>>>>> Schaffer,
>>>>>>>>> Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997),
>>>>>>>>> "Gapped BLAST and PSI-BLAST: a new generation of protein database
>>>>>>>>> search
>>>>>>>>> programs", Nucleic Acids Res. 25:3389-3402.
>>>>>>>>>
>>>>>>>>> Query= 000013_c10079-9984
>>>>>>>>> (50 letters)
>>>>>>>>>
>>>>>>>>> Database: Cyano_Probe.fasta
>>>>>>>>> 4760 sequences; 238,000 total letters
>>>>>>>>>
>>>>>>>>> Searching..................................................done
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Score
>>>>>>>>> E
>>>>>>>>> Sequences producing significant alignments:
>>>>>>>>> (bits)
>>>>>>>>> Value
>>>>>>>>>
>>>>>>>>> 000013_c10079-9984
>>>>>>>>> 100 7e-024
>>>>>>>>> 002619_2689273-2690037
>>>>>>>>> 24 0.36
>>>>>>>>> 001126_c1123720-1123385
>>>>>>>>> 24 0.36
>>>>>>>>> 003211_c3326737-3326480
>>>>>>>>> 22 1.4
>>>>>>>>> 002415_2471082-2471420
>>>>>>>>> 22 1.4
>>>>>>>>> 002269_2321276-2322463
>>>>>>>>> 22 1.4
>>>>>>>>> 001328_c1326535-1326164
>>>>>>>>> 22 1.4
>>>>>>>>>
>>>>>>>>>> 000013_c10079-9984
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Length = 50
>>>>>>>>>
>>>>>>>>> Score = 99.6 bits (50), Expect = 7e-024
>>>>>>>>> Identities = 50/50 (100%)
>>>>>>>>> Strand = Plus / Plus
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Query: 1 agtcaacaccaatctgagtttaatcactatcttgatcatgttagatatca 50
>>>>>>>>> ||||||||||||||||||||||||||||||||||||||||||||||||||
>>>>>>>>> Sbjct: 1 agtcaacaccaatctgagtttaatcactatcttgatcatgttagatatca 50
>>>>>>>>>
>>>>>>>>> _______________________________________________
>>>>>>>>> Bioperl-l mailing list
>>>>>>>>> Bioperl-l at lists.open-bio.org
>>>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> The Wellcome Trust Sanger Institute is operated by Genome Research
>>>>>>>> Limited,
>>>>>>>> a charity registered in England with number 1021457 and a company
>>>>>>>> registered
>>>>>>>> in England with number 2742969, whose registered office is 215 Euston
>>>>>>>> Road,
>>>>>>>> London, NW1 2BE.
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> The Wellcome Trust Sanger Institute is operated by Genome Research
>>>>>> Limited, a charity registered in England with number 1021457 and a
>>>>>> company registered in England with number 2742969, whose registered
>>>>>> office is 215 Euston Road, London, NW1 2BE.
>>>>>>
>>>>
>>>>
>>>> --
>>>> The Wellcome Trust Sanger Institute is operated by Genome Research
>>>> Limited,
>>>> a charity registered in England with number 1021457 and a company
>>>> registered
>>>> in England with number 2742969, whose registered office is 215 Euston
>>>> Road,
>>>> London, NW1 2BE.
>>
>>
>>
>> --
>> The Wellcome Trust Sanger Institute is operated by Genome Research Limited,
>> a charity registered in England with number 1021457 and a company registered
>> in England with number 2742969, whose registered office is 215 Euston Road,
>> London, NW1 2BE.
--
The Wellcome Trust Sanger Institute is operated by Genome Research
Limited, a charity registered in England with number 1021457 and a
company registered in England with number 2742969, whose registered
office is 215 Euston Road, London, NW1 2BE.
More information about the Bioperl-l
mailing list