[Bioperl-l] Remote blast
Roopa Raghuveer
rtbio.2009 at gmail.com
Fri Nov 20 15:52:09 UTC 2009
Hello everybody,
I have tried to use Remote blast on Trypanasoma brucei sequences and could
get certain hits.But I am unable to retrieve the complete sequence from
where I got hits.
i.e., I am unable to parse the blast output file for getting the complete
sequences of the hits. Here is my code.
#!/usr/bin/perl -w
use Bio::SearchIO;
my $blast_report = new Bio::SearchIO ('-format' => 'blast',
'-file' => $ARGV[0]);
my $result = $blast_report->next_result;
my $level = $ARGV[1];
while( my $hit = $result->next_hit) {
print $hit->name;
push(@arr1,$hit->name);
while( my $hsp = $hit->next_hsp()) {
if ($hsp->frac_identical() >= $level) {
#print $hsp->hit_string, "\n";
push(@arr,$hsp->hit_string);
}
}
}
$k=@arr1;
for($i=0;$i<$k;$i++){
push(@arr2,split(/|/,$arr1[$i]));
#print "$arr[$i]\n";
}
#$t=@arr2;
Here,I am trying to use the blast output file and get the complete sequence
where I found a hit but I could not get the complete sequence.
i/p:-
Last login: Mon Nov 16 11:57:22 on console
Welcome to Darwin!
lmbicip-mac1:~ cip$ ssh admin at 141.84.66.66
The authenticity of host '141.84.66.66 (141.84.66.66)' can't be established.
RSA key fingerprint is 2d:4a:09:1d:2e:f3:51:c7:ba:8b:29:37:36:f6:44:db.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added '141.84.66.66' (RSA) to the list of known hosts.
Password:
Last login: Fri Nov 20 13:52:57 2009 from 10.153.189.239
Have a lot of fun...
admin at BosLinux:~> clear
admin at BosLinux:~> cd Documents/
admin at BosLinux:~/Documents> clear
admin at BosLinux:~/Documents> vim blast.pl
admin at BosLinux:~/Documents> clear
admin at BosLinux:~/Documents> vim nnn.pl
admin at BosLinux:~/Documents> vim other.pl
admin at BosLinux:~/Documents> vim amino.fa
admin at BosLinux:~/Documents> vim Tb09.211.2410.out
admin at BosLinux:~/Documents> vim Tb09.211.2410.out
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct 661 TTTGATGAAACCCCAATTCGGACGTATGAAAAGATTCTTGCGGGCCGGCTTAAATTCCCC
720
Query 721 AATTGGTTTGATGAGCGTGCGCGGGATCTCGTAAAGGGTTTATTGCAAACGGATCACACG
780
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct 721 AATTGGTTTGATGAGCGTGCGCGGGATCTCGTAAAGGGTTTATTGCAAACGGATCACACG
780
Query 781 AAACGGTTGGGCACGCTGAAGGATGGCGTAGCTGATGTGAAGAATCACCCATTCTTCCGT
840
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct 781 AAACGGTTGGGCACGCTGAAGGATGGCGTAGCTGATGTGAAGAATCACCCATTCTTCCGT
840
Query 841 GGTGCGAATTGGGAGAAACTCTATGGACGTCATTATAACGCCCCCATTGCCGTAAAAGTG
900
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct 841 GGTGCGAATTGGGAGAAACTCTATGGACGTCATTATAACGCCCCCATTGCCGTAAAAGTG
900
Query 901 AAGAGCCCCGGCGACACAAGTAACTTTGAGTCGTATCCCGAGAGTGGAGATAAGGGTTCT
960
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct 901 AAGAGCCCCGGCGACACAAGTAACTTTGAGTCGTATCCCGAGAGTGGAGATAAGGGTTCT
960
Query 961 CCTCCACTAACCCCTTCGCAACAGGTTGCATTCCGTGGTTTTTAG 1005
|||||||||||||||||||||||||||||||||||||||||||||
Sbjct 961 CCTCCACTAACCCCTTCGCAACAGGTTGCATTCCGTGGTTTTTAG 1005
>ref|XM_822286.1| Trypanosoma brucei TREU927 protein kinase A catalytic
subunit
isoform 2 (Tb09.211.2360) partial mRNA
Length=1011
Score = 1622 bits (1798), Expect = 0.0
Identities = 944/974 (96%), Gaps = 0/974 (0%)
Strand=Plus/Plus
Query 32 TGTTTACCAAGCCTGACACATCGGGATGGAAGCTGAGTGACTTTGAAATGGGTGACACGC
91
|||||||||| |||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct 38 TGTTTACCAAACCTGACACATCGGGATGGAAGCTGAGTGACTTTGAAATGGGTGACACGC
97
Query 92 TAGGGACCGGCTCGTTTGGTCGCGTGCGCATTGCAAAACTGAAGAGCAGGGGGGAGTATT
151
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct 98 TAGGGACCGGCTCGTTTGGTCGCGTGCGCATTGCAAAACTGAAGAGCAGGGGGGAGTATT
157
Query 152 ATGCAATAAAATGTCTAAAGAAGCATGAGATACTAAAGATGAAGCAGGTACAACACCTGA
211
|||||||||||||||||||||||| |||||||||||||||||||||||||||||||||||
Sbjct 158 ATGCAATAAAATGTCTAAAGAAGCGTGAGATACTAAAGATGAAGCAGGTACAACACCTGA
217
Query 212 ACCAAGAGAAGCAAATTCTAATGGAGTTGTCACACCCCTTCATTGTGAACATGATGTGTT
271
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct 218 ACCAAGAGAAGCAAATTCTAATGGAGTTGTCACACCCCTTCATTGTGAACATGATGTGTT
277
uery 272 CCTTCCAGGATGAGAACCGCGTCTACTTTGTTCTAGAATTTGTGGTAGGTGGTGAGGTAT
331
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct 278 CCTTCCAGGATGAGAACCGCGTCTACTTTGTTCTAGAATTTGTGGTAGGTGGTGAGGTAT
337
Query 332 TTACTCACCTTCGTTCCGCAGGCCGTTTCCCGAATGACGTAGCGAAGTTCTATCATGCGG
391
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct 338 TTACTCACCTTCGTTCCGCAGGCCGTTTCCCGAATGACGTAGCGAAGTTCTATCATGCGG
397
Query 392 AGCTTGTGTTGGCCTTTGAATATTTACACTCGAAGGACATTATCTACCGTGACTTGAAAC
451
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct 398 AGCTTGTGTTGGCCTTTGAATATTTACACTCGAAGGACATTATCTACCGTGACTTGAAAC
457
Query 452 CTGAGAATCTGCTACTTGATGGGAAGGGACACGTCAAGGTGACTGATTTTGGTTTTGCTA
511
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct 458 CTGAGAATCTGCTACTTGATGGGAAGGGACACGTCAAGGTGACTGATTTTGGTTTTGCTA
517
Query 512 AGAAGGTGACGGATCGTACCTATACGTTATGTGGGACACCTGAGTATCTTGCACCTGAGG
571
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct 518 AGAAGGTGACGGATCGTACCTATACGTTATGTGGGACACCTGAGTATCTTGCACCTGAGG
577
Query 572 TAATTCAGAGCAAAGGACATGGGAAGGCTGTGGATTGGTGGACGATGGGTGTTTTGCTGT
631
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
It follows like this.
The output I got is
ATGACGACAACTCCCACTGGTGATGGCCAACTGTTTACCAAGCCTGACACATCGGGATGGAAGCTGAGTGACTTTGAAATGGGTGACACGCTAGGGACCGGCTCGTTTGGTCGCGTGCGCATTGCAAAACTGAAGAGCAGGGGGGAGTATTATGCAATAAAATGTCTAAAGAAGCATGAGATACTAAAGATGAAGCAGGTACAACACCTGAACCAAGAGAAGCAAATTCTAATGGAGTTGTCACACCCCTTCATTGTGAACATGATGTGTTCCTTCCAGGATGAGAACCGCGTCTACTTTGTTCTAGAATTTGTGGTAGGTGGTGAGGTATTTACTCACCTTCGTTCCGCAGGCCGTTTCCCGAATGACGTAGCGAAGTTCTATCATGCGGAGCTTGTGTTGGCCTTTGAATATTTACACTCGAAGGACATTATCTACCGTGACTTGAAACCTGAGAATCTGCTACTTGATGGGAAGGGACACGTCAAGGTGACTGATTTTGGTTTTGCTAAGAAGGTGACGGATCGTACCTATACGTTATGTGGGACACCTGAGTATCTTGCACCTGAGGTAATTCAGAGCAAAGGACATGGGAAGGCTGTGGATTGGTGGACGATGGGTGTTTTGCTGTATGAATTCATAGCTGGCCATCCTCCCTTTTTTGATGAAACCCCAATTCGGACGTATGAAAAGATTCTTGCGGGCCGGCTTAAATTCCCCAATTGGTTTGATGAGCGTGCGCGGGATCTCGTAAAGGGTTTATTGCAAACGGATCACACGAAACGGTTGGGCACGCTGAAGGATGGCGTAGCTGATGTGAAGAATCACCCATTCTTCCGTGGTGCGAATTGGGAGAAACTCTATGGACGTCATTATAACGCCCCCATTGCCGTAAAAGTGAAGAGCCCCGGCGACACAAGTAACTTTGAGTCGTATCCCGAGAGTGGAGATAAGGGTTCTCCTCCACTAACCCCTTCGCAACAGG
TTGCATTCCGTGGTTTTTAG
TGTTTACCAAACCTGACACATCGGGATGGAAGCTGAGTGACTTTGAAATGGGTGACACGCTAGGGACCGGCTCGTTTGGTCGCGTGCGCATTGCAAAACTGAAGAGCAGGGGGGAGTATTATGCAATAAAATGTCTAAAGAAGCGTGAGATACTAAAGATGAAGCAGGTACAACACCTGAACCAAGAGAAGCAAATTCTAATGGAGTTGTCACACCCCTTCATTGTGAACATGATGTGTTCCTTCCAGGATGAGAACCGCGTCTACTTTGTTCTAGAATTTGTGGTAGGTGGTGAGGTATTTACTCACCTTCGTTCCGCAGGCCGTTTCCCGAATGACGTAGCGAAGTTCTATCATGCGGAGCTTGTGTTGGCCTTTGAATATTTACACTCGAAGGACATTATCTACCGTGACTTGAAACCTGAGAATCTGCTACTTGATGGGAAGGGACACGTCAAGGTGACTGATTTTGGTTTTGCTAAGAAGGTGACGGATCGTACCTATACGTTATGTGGGACACCTGAGTATCTTGCACCTGAGGTAATTCAGAGCAAAGGACATGGGAAGGCTGTGGATTGGTGGACGATGGGTGTTTTGCTGTATGAATTCATAGCTGGCCATCCTCCCTTTTTTGATGAAACCCCAATTCGGACGTATGAAAAGATTCTTGCGGGCCGGTTCAAATTCCCCAATTGGTTTGACTCCCGTGCGCGGGATCTCGTAAAGGGTTTATTGCAAACGGATCACACGAAACGGTTGGGCACGCTGAAGGATGGCGTAGCTGATGTGAAGAATCACCCATTCTTCCGTGGTGCGAATTGGGAGAAACTCTATGGACGTCATTATCACGCTCCCATTCCTGTAAAAGTGAAGAGCCCCGGCGACACAAGTAACTTTGAGTCGTATCCCGAGAGTGGGGATAAGCGGTTGCCCCCGTTAGCACCATCACAACAATTGGAGTTCCGTGGGTTTTAG
GGATGATGACCGATTGTACCTCCTCCTCGAGTATGTGGTGGGTGGCGAGCTGT
TCTCCCACCTCCGGAAGGCGGGAAAATTCCCTAATGATGTAGCCAAGTTCTACTCCGCAGAAGTGGTTTTGGCGTTTGAATATATTCATGAGTGCGGCATCGTATACCGTGACTTGAAGCCAGAAAATGTGCTTTTGGACAAGCAGGGAAACATTAAGATTACGGACTTTGGGTTCGCGAAACGCGTTAGGGACAGAACGTACACGCTATGTGGGACTCCAGAGTATCTTGCGCCGGAGATAATCCAAAGTAAAGGTCACGATCGGGCTGTGGATTGGTGGACACTCGGAATTCTTCTCTATGAGATGCTTGTCGGTTATCCTCCTTTTTTCGACGAGAGTCCTTTTAGAACATACGAAAAAATTTTAGAGGGGAAACTTCAGTTTCCAAAGTGGGTGGAGATGCGGGCGAAGGACCTCATAAAGAGTTTTTTAACAATTGAACCAACGAAACG
i.e.,It is only giving the region where it could find the best alignment
i.e., the best hit ones.
I want the complete sequence i.e., sequences corresponding to the accession
numbers
XM_822292.1
XM_822286.1
XM_822694.1
Database used in Remote blast was RefSeq i.e.,(refseq_rna),organism used
:Trypanasoma brucei.
Can any one please help me in solving this problem
Regards,
Roopa.
On Fri, Nov 20, 2009 at 12:30 PM, Roopa Raghuveer <rtbio.2009 at gmail.com>wrote:
>
> Hello Roy,
>
> Thanks a lot for your reply.My code is working for my sequence now.
>
> Thanks alot.
>
> Regards,
> Roopa.
>
> On Thu, Nov 19, 2009 at 5:10 PM, Roy Chaudhuri <roy.chaudhuri at gmail.com>wrote:
>
>> Hi Roopa,
>>
>> I think that the -Organism parameter that you specify for
>> Bio::Tools::Run::RemoteBlast is ignored - I can't find any reference to it
>> in the documentation:
>>
>> http://search.cpan.org/~cjfields/BioPerl-1.6.1/Bio/Tools/Run/RemoteBlast.pm<http://search.cpan.org/%7Ecjfields/BioPerl-1.6.1/Bio/Tools/Run/RemoteBlast.pm>
>>
>> You have the correct approach in your code - limiting the search to the
>> Entrez query "Trypanosoma brucei[ORGN]", but the line is commented out. If
>> you uncomment the line (and add a semicolon afterwards), the program runs
>> correctly, but no hits are reported below your threshold e-value. If you
>> change the value of $e_val to 10 then some T.brucei hits are reported.
>>
>> Roy.
>>
>> Roopa Raghuveer wrote:
>>
>>> Hello everybody,
>>>
>>> I have a problem. I would like to use remote blast to find sequences
>>> matching for an input sequence.
>>>
>>> Ex:-I would like to search sequences which match Trypanosoma Brucei
>>> sequence.
>>>
>>> I want the output to be only Trypanosoma Brucei sequences matching with
>>> my
>>> query.When i tried to use remoteblast to nr database,I got sequences from
>>> different organisms like E.coli,Pseudomonas etc.,
>>>
>>> Could you please tell me how can this be solved...?
>>>
>>> My code is as follows.
>>>
>>> use Bio::Tools::Run::RemoteBlast;
>>> use strict;
>>> my $prog = 'blastn';
>>> my $db = 'nr';
>>> my $e_val= '1e-10';
>>> my $organism= 'Trypanosoma Brucei';
>>>
>>> my @params = ( '-prog' => $prog,
>>> '-data' => $db,
>>> '-expect' => $e_val,
>>> '-readmethod' => 'SearchIO',
>>> '-Organism' => $organism );
>>>
>>> my $factory = Bio::Tools::Run::RemoteBlast->
>>> new(@params);
>>>
>>> #change a paramter
>>> #$Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = 'Trypanosoma
>>> brucei[ORGN]'
>>>
>>> #remove a parameter
>>> #delete $Bio::Tools::Run::RemoteBlast::HEADER{'FILTER'};
>>>
>>> my $v = 1;
>>> #$v is just to turn on and off the messages
>>>
>>> my $str = Bio::SeqIO->new(-file=>'amino.fa' , '-format' => 'fasta' ,
>>> '-organism' => 'Trypanosoma Brucei' );
>>>
>>> while (my $input = $str->next_seq()){
>>> #Blast a sequence against a database:
>>> my $r = $factory->submit_blast($input);
>>> #my $r = $factory->submit_blast('amino.fa');
>>>
>>> print STDERR "waiting..." if( $v > 0 );
>>> while ( my @rids = $factory->each_rid ) {
>>> foreach my $rid ( @rids ) {
>>> my $rc = $factory->retrieve_blast($rid);
>>> if( !ref($rc) ) {
>>> if( $rc < 0 ) {
>>> $factory->remove_rid($rid);
>>> }
>>> print STDERR "." if ( $v > 0 );
>>> sleep 5;
>>> }
>>> else {
>>> my $result = $rc->next_result();
>>> #save the output
>>> my $filename = $result->query_name()."\.out";
>>> $factory->save_output($filename);
>>> $factory->remove_rid($rid);
>>> print "\nQuery Name: ", $result->query_name(), "\n";
>>> while ( my $hit = $result->next_hit ) {
>>> next unless ( $v > 0);
>>> print "\thit name is ", $hit->name, "\n";
>>> while( my $hsp = $hit->next_hsp ) {
>>> print "\t\tscore is ", $hsp->score, "\n";
>>> }
>>> }
>>> }
>>> }
>>> }
>>> }
>>>
>>> My input sequence is
>>>
>>> ref|NC_009512.1|:385-1902
>>>>
>>> GTGTCAGTGGAACTTTGGCAGCAGTGCGTGGAGCTTCTGCGCGATGAACTGCCTGCCCAGCAATTCAACA
>>> CCTGGATCCGTCCGCTACAGGTCGAAGCCGAAGGCGACGAGTTGCGCGTCTATGCGCCTAACCGTTTCGT
>>> TCTCGATTGGGTCAATGAAAAGTACCTGGGTCGTTTGCTCGAGCTGTTGGGTGAGAACGGTAGCGGCATT
>>> GCACCAGCCCTTTCCTTATTAATAGGTAGCCGCCGCAGCTCGGCCCCAAGGGCTGCACCCAACGCGCCGG
>>> TCAGCGCTGCCGTTGCGGCTTCGCTGGCGCAGACTCAGGCGCACAAGACGGCCCCGGCAGCAGCGGTTGA
>>> ACCCGTTGCCGTGGCCGCGGCCGAGCCTGTATTGGTCGAGACGTCTTCGCGTGACAGCTTTGATGCCATG
>>> GCCGAGCCTGCTGCTGCGCCGCCCAGTGGTGGCCGGGCTGAACAGCGCACCGTGCAGGTTGAAGGTGCGC
>>> TCAAGCACACCAGTTACCTGAACCGGACCTTTACCTTTGACACCTTCGTCGAAGGTAAGTCGAACCAGCT
>>> CGCCCGCGCGGCTGCCTGGCAGGTTGCGGACAACCCTAAGCATGGCTACAACCCACTGTTCCTTTATGGC
>>> GGTGTGGGTTTGGGTAAAACCCACCTTATGCATGCTGTGGGTAACCATCTGCTGAAGAAGAATCCGAACG
>>> CCAAGGTGGTGTACCTGCATTCGGAGCGCTTCGTCGCGGACATGGTCAAAGCGTTGCAACTCAACGCCAT
>>> CAACGAATTCAAGCGCTTCTACCGCTCGGTGGACGCGTTGCTGATCGACGATATCCAGTTCTTCGCTCGC
>>> AAAGAGCGCTCGCAAGAAGAGTTTTTCCACACCTTCAACGCCTTGCTTGAGGGTGGCCAGCAGGTAATCC
>>> TTACCTCTGACCGCTATCCCAAGGAAATCGAAGGCCTGGAAGAGCGTCTGAAGTCGCGCTTTGGTTGGGG
>>> CCTGACGGTGGCTGTCGAGCCGCCAGAGCTGGAGACCCGCGTAGCGATCCTGATGAAGAAGGCCGACCAG
>>> GCCAAAGTCGAGCTCCCGCATGACGCAGCCTTTTTCATCGCTCAGCGCATCCGGTCCAACGTCCGTGAGC
>>> TGGAAGGTGCACTGAAGCGAGTTATTGCTCACTCGCACTTCATGGGGCGTGACATCACCATCGAGCTGAT
>>> TCGTGAATCGCTCAAGGATCTGTTGGCGCTGCAAGACAAACTGGTCAGTGTGGATAACATTCAGCGTACC
>>> GTCGCTGAGTACTACAAGATCAAGATCTCCGATCTGTTGTCCAAGCGTCGTTCGCGTTCTGTCGCGCGCC
>>> CGCGTCAGGTAGCCATGGCCCTGTCCAAGGAGTTGACCAACCACAGTCTGCCGGAAATCGGCGACATGTT
>>> CGGTGGTCGCGACCATACGACCGTGCTGCACGCCTGCCGCAAAATCAATGAACTGAAGGAATCCGACGCG
>>> GACATCCGCGAGGACTACAAGAACCTGCTGCGGACGCTGACGACCTGA
>>>
>>> Please mail me regarding any queries.
>>>
>>> Regards,
>>> Roopa.
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>
>>
>
More information about the Bioperl-l
mailing list