[Bioperl-l] blast and length adjustment

dimitark at bii.a-star.edu.sg dimitark at bii.a-star.edu.sg
Mon Aug 5 01:21:54 UTC 2013


Hi Jason,
no i was not interpreting wrongly. I just found something about length  
correction only about those blast methods. Did not find length  
correction for blastn method even tho on NCBI site i see they apply  
some length correction.

So when i blast locally with Bioperl and Blastn i get one result and  
when i blast with blastn on NCBI i get a different result.

So i was wondering if there is such length correction in Bioperl  
concerning blastn. I could not find. Also was wondering if such  
correction should be implemented for blastn?

Well thank you for your reply!

Cheers
Dimitar

Quoting Jason Stajich <jason.stajich at gmail.com>:

> On Aug 1, 2013, at 9:01 PM, dimitark at bii.a-star.edu.sg wrote:
>
>> Hi guys,
>> i have a question about Blast.
>>
>> I was working on some project where i blast using Bioperl against  
>> the human-RNA. So i found 2 sequences which hit on totally  
>> different RNAs but when i used cd-hit-est they cluster together. I  
>> even aligned them and they were almost identical, from NCBI aligner:
>>
>> 2658 bits(1439) 	0.0 	1441/1442(99%) 	0/1442(0%) 	Plus/Plus
>>
>> Then i decided to blast them on NCBI and they again hit on  
>> different sequences.
>> Then i checked the parameters of each search and found that both  
>> queries were length adjusted aka some length was removed, namely  
>> around 30 nucleotides.
>>
>> Well it was interesting to see what bioperl does about that so i  
>> found the following in BlastUtils.pm:
>>
>>   # Adjust length based on BLAST flavor.
>>    my $prog = $sbjct->algorithm;
>>    if($prog eq 'TBLASTN') {
>> 	$sbjct->{'_length_aln_sbjct'} /= 3;
>>    } elsif($prog eq 'BLASTX' ) {
>> 	$sbjct->{'_length_aln_query'} /= 3;
>>    } elsif($prog eq 'TBLASTX') {
>> 	$sbjct->{'_length_aln_query'} /= 3;
>> 	$sbjct->{'_length_aln_sbjct'} /= 3;
>>    }
>
> You are wrongly interpreting the length adjustment that happens at  
> NCBI with this length adjustment. The code above is to deal with  
> translated searches - notice they all are division by 3 because the  
> coordinates presented in the BLAST results for a translated search  
> will be the original DNA/RNA coords but when wants to know what the  
> length is in the alignment space it is really at the protein scale.
>
> So this is not the adjustment you seem to be looking for.
>>
>> But seems there is no length adjustment for blastn as it seems to  
>> exist on NCBI.
>>
>> Its kind of frustrating as i am trying to do some differential  
>> expression analysis with my own scripts. But then if these 2 seqs  
>> are so identical they should have the same annotation but they do  
>> not cos of that strange blast results.
>
> No idea what you mean by the rest of this when it comes to your  
> candidate RNA sequences or what you are seeking to find from the  
> BLAST searches to help you on that front.
>>
>> I am really sorry if my post is a bit messy. If you have any  
>> questions on what i meant please ask.
>>
>> Any comments would be greatly appreciated!
>>
>> Cheers
>> D.
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> Jason Stajich
> jason.stajich at gmail.com
> jason at bioperl.org






More information about the Bioperl-l mailing list