[Bioperl-l] blast and length adjustment
Fields, Christopher J
cjfields at illinois.edu
Mon Aug 5 04:25:41 UTC 2013
Dimitar,
If I'm reading this correctly ('i found 2 sequences which hit on totally different RNAs but when i used cd-hit-est they cluster together'), the problem lies not within parsing BLAST output (Bio::SearchIO) but submission of the sequences for BLAST analysis. That has nothing to do with the code you point out, which is used in parsing BLAST *results*, not the submission of a BLAST analysis. As Jason implied, the problem more likely is in the latter, not the former.
Unfortunately, w/o a more specific example to work with (e.g. code, maybe a small test sequence) this won't be very productive; we're a little stuck and will just go in circles surmising the problem w/o actually trying to find out whether this is a bug or not. I'm leaning towards not at the moment.
chris
On Aug 4, 2013, at 8:21 PM, dimitark at bii.a-star.edu.sg wrote:
> Hi Jason,
> no i was not interpreting wrongly. I just found something about length correction only about those blast methods. Did not find length correction for blastn method even tho on NCBI site i see they apply some length correction.
>
> So when i blast locally with Bioperl and Blastn i get one result and when i blast with blastn on NCBI i get a different result.
>
> So i was wondering if there is such length correction in Bioperl concerning blastn. I could not find. Also was wondering if such correction should be implemented for blastn?
>
> Well thank you for your reply!
>
> Cheers
> Dimitar
>
> Quoting Jason Stajich <jason.stajich at gmail.com>:
>
>> On Aug 1, 2013, at 9:01 PM, dimitark at bii.a-star.edu.sg wrote:
>>
>>> Hi guys,
>>> i have a question about Blast.
>>>
>>> I was working on some project where i blast using Bioperl against the human-RNA. So i found 2 sequences which hit on totally different RNAs but when i used cd-hit-est they cluster together. I even aligned them and they were almost identical, from NCBI aligner:
>>>
>>> 2658 bits(1439) 0.0 1441/1442(99%) 0/1442(0%) Plus/Plus
>>>
>>> Then i decided to blast them on NCBI and they again hit on different sequences.
>>> Then i checked the parameters of each search and found that both queries were length adjusted aka some length was removed, namely around 30 nucleotides.
>>>
>>> Well it was interesting to see what bioperl does about that so i found the following in BlastUtils.pm:
>>>
>>> # Adjust length based on BLAST flavor.
>>> my $prog = $sbjct->algorithm;
>>> if($prog eq 'TBLASTN') {
>>> $sbjct->{'_length_aln_sbjct'} /= 3;
>>> } elsif($prog eq 'BLASTX' ) {
>>> $sbjct->{'_length_aln_query'} /= 3;
>>> } elsif($prog eq 'TBLASTX') {
>>> $sbjct->{'_length_aln_query'} /= 3;
>>> $sbjct->{'_length_aln_sbjct'} /= 3;
>>> }
>>
>> You are wrongly interpreting the length adjustment that happens at NCBI with this length adjustment. The code above is to deal with translated searches - notice they all are division by 3 because the coordinates presented in the BLAST results for a translated search will be the original DNA/RNA coords but when wants to know what the length is in the alignment space it is really at the protein scale.
>>
>> So this is not the adjustment you seem to be looking for.
>>>
>>> But seems there is no length adjustment for blastn as it seems to exist on NCBI.
>>>
>>> Its kind of frustrating as i am trying to do some differential expression analysis with my own scripts. But then if these 2 seqs are so identical they should have the same annotation but they do not cos of that strange blast results.
>>
>> No idea what you mean by the rest of this when it comes to your candidate RNA sequences or what you are seeking to find from the BLAST searches to help you on that front.
>>>
>>> I am really sorry if my post is a bit messy. If you have any questions on what i meant please ask.
>>>
>>> Any comments would be greatly appreciated!
>>>
>>> Cheers
>>> D.
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>> Jason Stajich
>> jason.stajich at gmail.com
>> jason at bioperl.org
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
More information about the Bioperl-l
mailing list