[Bioperl-l] remoteblast xml problem

Chris Fields cjfields at uiuc.edu
Sat Jun 3 04:35:21 UTC 2006


On Jun 2, 2006, at 8:36 PM, Hubert Prielinger wrote:

> hi chris,
> thanks but I never intended to run the remoteblast with so much,  
> only a few of them, acutally I goal is to run the phiblast with  
> regular expression, so that i just don't need that
> file anymore

Not a problem.  Just to let you know, I did manage to get the script  
working, so I'm marking the bug INVALID.  I think the problem isn't  
that there is an infinite loop so much as setting composition-based  
statistics causes the search to take much much longer; try removing  
that line to see what I mean.

Just so you know, using $result->query_name doesn't get you what you  
would expect (it gives you a part of the RID, which you don't want;  
this is something in the XML output that is beyond our control).  You  
might want to change it to something else or you'll get filenames  
with numerical names.

> another question for parsing the xml output....is there a xml  
> parser available for blast xml output or how to start.....
> I have looked up at the wikiperl and cpan Bio::SearchIO::blastxml,  
> but I'm not sure how to start....sorry, I guess I'm too stupid....
> is their maybe another introduction or an example.

Bio::SearchIO objects are used to parse BLAST XML output if you have  
it saved to a file.  For instance:

my $factory = Bio::SearchIO->new(-file => $file, -format => 'blastxml');

while (my $result = $factory->next_result) {
   while (my $hit = $result->next_hit) {
      while (my $hsp = $hit->next_hsp {
         #do stuff here
       }
    }
}

The only thing that changes in parsing a text BLAST report from an  
XML BLAST report is the -format line (similar to the -readmethod  
parameter in RemoteBlast).  You shouldn't need to look up any more  
documentation other than these on the wiki:

http://www.bioperl.org/wiki/HOWTO:SearchIO

http://www.bioperl.org/wiki/Module:Bio::SearchIO

http://www.bioperl.org/wiki/Module:Bio::SearchIO::blastxml

Pay attention to the fact you'll need to install XML::SAX (CPAN) and  
that XML::SAX::ExpatXS (and Expat) is highly recommended for speeding  
up parsing.

Chris

> thanks
> Hubert
>
>
> Chris Fields wrote:
>> Yes, I see the same error you do.  But I have a similar script   
>> (blastp, XML blast report, XML parsing, similar loop structure)  
>> that  works fine.  I'm trying to dissect the problem but I think  
>> it may be  something logically wrong here (something not so  
>> obvious) and not a  bug...
>>
>> What I'm trying to say is, when you send sequences using  
>> remoteblast  like, this you are essentially spamming the NCBI  
>> BLAST server with  ~1600 requests.  This script wasn't set up with  
>> that intent in mind;  you should really try to set up your own  
>> local blast database if  possible.  If you can't, try running this  
>> script in off-hours  (10pm-6am EST or something like that).
>>
>>
>> Chris
>>
>> On Jun 2, 2006, at 7:49 PM, Hubert Prielinger wrote:
>>
>>
>>> hi,
>>> input database: swissprot
>>>         matrix: pam30
>>>         count: 1
>>>         gapcosts: 9 1
>>>
>>> I know that there are  a lot of sequences, but that doesn't  
>>> matter,  you can delete all of them except one, the amount of the  
>>> sequences  is not the problem, the script reads one line and  
>>> submits  it.....then the second line and so on.....I have tried  
>>> it with only  one sequence either and I got the same result....  
>>> the script run at  that time for more than 20  
>>> minutes!!!!!! .....and that should be  enough time to retrieve  
>>> the results for ONE sequence, I guess
>>>
>>> regards
>>> Hubert
>>>
>>>
>>>
>>> Chris Fields wrote:
>>>
>>>> You need to add the input conditions as well (you have several   
>>>> <STDIN> lines which may play a role; I would like to know what  
>>>> you  normally enter for those).
>>>>
>>>> How long did you let the script run?  I ran a quick check on  
>>>> your  sequences; you have almost 1600, so you have to expect  
>>>> that you'll  run into some problems here!  Most here (including  
>>>> me) would  suggest you try installing a local blast setup for  
>>>> something like  this.
>>>>
>>>> Chris
>>>>
>>>> On Jun 2, 2006, at 6:19 PM, Hubert Prielinger wrote:
>>>>
>>>>
>>>>> hi,
>>>>> I have submitted the bug -> Bug 2017
>>>>> with the script and input file, just start it from command line
>>>>>
>>>>> thank you very much
>>>>> greetings
>>>>>
>>>>> Hubert
>>>>>
>>>>> Chris Fields wrote:
>>>>>
>>>>>> Hubert,
>>>>>>
>>>>>> I have a script that's using blastxml and XML output which  
>>>>>> seems  to work.
>>>>>> I'll try looking at it to get a better idea this weekend.
>>>>>>
>>>>>> Chris
>>>>>>
>>>>>>
>>>>>>
>>>>>>> -----Original Message-----
>>>>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>>>>>>> bounces at lists.open-bio.org] On Behalf Of Hubert Prielinger
>>>>>>> Sent: Friday, June 02, 2006 4:12 PM
>>>>>>> To: Chris Fields; bioperl-l at bioperl.org; Chris Fields;  
>>>>>>> 'Sendu  Bala'
>>>>>>> Subject: Re: [Bioperl-l] remoteblast xml problem
>>>>>>>
>>>>>>> hi,
>>>>>>> sorry, but I have updated the remoteblast module and I have  
>>>>>>> run  several
>>>>>>> attempts with the same results as before. It didn't work.
>>>>>>> I didn't get any results.
>>>>>>>
>>>>>>> regards
>>>>>>> Hubert
>>>>>>>
>>>>>>>
>>>>>>> Chris Fields wrote:
>>>>>>>
>>>>>>>
>>>>>>>> Sendu, Hubert,
>>>>>>>>
>>>>>>>>
>>>>>>>> Hubert, your code looks fine so Sendu's patch should fix  
>>>>>>>> the  problem
>>>>>>>>
>>>>>>>>
>>>>>>> (break
>>>>>>>
>>>>>>>
>>>>>>>> out of that infinite loop).  I applied Sendu's patch to   
>>>>>>>> RemoteBlast in
>>>>>>>>
>>>>>>>>
>>>>>>> CVS;
>>>>>>>
>>>>>>>
>>>>>>>> it passed all tests in RemoteBlast.t.  Try updating from  
>>>>>>>> CVS  to see if
>>>>>>>>
>>>>>>>>
>>>>>>> it
>>>>>>>
>>>>>>>
>>>>>>>> works.
>>>>>>>>
>>>>>>>> Chris
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>> -----Original Message-----
>>>>>>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>>>>>>>>> bounces at lists.open-bio.org] On Behalf Of Sendu Bala
>>>>>>>>> Sent: Friday, June 02, 2006 4:04 AM
>>>>>>>>> To: bioperl-l at lists.open-bio.org
>>>>>>>>> Subject: Re: [Bioperl-l] remoteblast xml problem
>>>>>>>>>
>>>>>>>>> Hubert Prielinger wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> hi,
>>>>>>>>>> I have the following program and it worked quite well,  
>>>>>>>>>> for  retrieving
>>>>>>>>>> remoteblast results in a textfile,
>>>>>>>>>> now I have altered it to to xml, and it didn't work   
>>>>>>>>>> anymore.....
>>>>>>>>>> it takes all the parameter at the commandline, submits  
>>>>>>>>>> the  query, but
>>>>>>>>>>
>>>>>>>>>>
>>>>>>> I
>>>>>>>
>>>>>>>
>>>>>>>>>> don't retrieve any results file anymore.....
>>>>>>>>>>
>>>>>>>>>> it seems that it hangs in a endless loop......
>>>>>>>>>> the only output I get is:  $rc is not a ref! over and   
>>>>>>>>>> over..... it
>>>>>>>>>> doesn't enter the else term anymore....
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>> There is no problem with your code. The problem is with  
>>>>>>>>> the  NCBI server
>>>>>>>>> and should be reported to them. You can visit the site and  
>>>>>>>>> do  a blast,
>>>>>>>>> requesting xml format, and you will typically get one  
>>>>>>>>> normal  'waiting'
>>>>>>>>> message and the promise that it will be updated in x  
>>>>>>>>> seconds,  but
>>>>>>>>> subsequent attempts to get progress information result in  
>>>>>>>>> an  xml error
>>>>>>>>> page because the NCBI server doesn't actually send any data.
>>>>>>>>>
>>>>>>>>> Unfortunately the way that the bioperl code is written, it   
>>>>>>>>> treats no
>>>>>>>>> data as 'waiting' instead of an error. I've offered a  
>>>>>>>>> patch  to fix this
>>>>>>>>> at this bug page:
>>>>>>>>> http://bugzilla.bioperl.org/show_bug.cgi?id=2015
>>>>>>>>> _______________________________________________
>>>>>>>>> Bioperl-l mailing list
>>>>>>>>> Bioperl-l at lists.open-bio.org
>>>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> Bioperl-l mailing list
>>>>>>>> Bioperl-l at lists.open-bio.org
>>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Bioperl-l mailing list
>>>>>>> Bioperl-l at lists.open-bio.org
>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>> _______________________________________________
>>>>> Bioperl-l mailing list
>>>>> Bioperl-l at lists.open-bio.org
>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>
>>>> Christopher Fields
>>>> Postdoctoral Researcher
>>>> Lab of Dr. Robert Switzer
>>>> Dept of Biochemistry
>>>> University of Illinois Urbana-Champaign
>>>>
>>>>
>>>>
>>>>
>>>>
>>
>> Christopher Fields
>> Postdoctoral Researcher
>> Lab of Dr. Robert Switzer
>> Dept of Biochemistry
>> University of Illinois Urbana-Champaign
>>
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>>
>>
>

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign






More information about the Bioperl-l mailing list