[Bioperl-l] remoteblast xml problem
Chris Fields
cjfields at uiuc.edu
Sat Jun 3 04:35:21 UTC 2006
On Jun 2, 2006, at 8:36 PM, Hubert Prielinger wrote:
> hi chris,
> thanks but I never intended to run the remoteblast with so much,
> only a few of them, acutally I goal is to run the phiblast with
> regular expression, so that i just don't need that
> file anymore
Not a problem. Just to let you know, I did manage to get the script
working, so I'm marking the bug INVALID. I think the problem isn't
that there is an infinite loop so much as setting composition-based
statistics causes the search to take much much longer; try removing
that line to see what I mean.
Just so you know, using $result->query_name doesn't get you what you
would expect (it gives you a part of the RID, which you don't want;
this is something in the XML output that is beyond our control). You
might want to change it to something else or you'll get filenames
with numerical names.
> another question for parsing the xml output....is there a xml
> parser available for blast xml output or how to start.....
> I have looked up at the wikiperl and cpan Bio::SearchIO::blastxml,
> but I'm not sure how to start....sorry, I guess I'm too stupid....
> is their maybe another introduction or an example.
Bio::SearchIO objects are used to parse BLAST XML output if you have
it saved to a file. For instance:
my $factory = Bio::SearchIO->new(-file => $file, -format => 'blastxml');
while (my $result = $factory->next_result) {
while (my $hit = $result->next_hit) {
while (my $hsp = $hit->next_hsp {
#do stuff here
}
}
}
The only thing that changes in parsing a text BLAST report from an
XML BLAST report is the -format line (similar to the -readmethod
parameter in RemoteBlast). You shouldn't need to look up any more
documentation other than these on the wiki:
http://www.bioperl.org/wiki/HOWTO:SearchIO
http://www.bioperl.org/wiki/Module:Bio::SearchIO
http://www.bioperl.org/wiki/Module:Bio::SearchIO::blastxml
Pay attention to the fact you'll need to install XML::SAX (CPAN) and
that XML::SAX::ExpatXS (and Expat) is highly recommended for speeding
up parsing.
Chris
> thanks
> Hubert
>
>
> Chris Fields wrote:
>> Yes, I see the same error you do. But I have a similar script
>> (blastp, XML blast report, XML parsing, similar loop structure)
>> that works fine. I'm trying to dissect the problem but I think
>> it may be something logically wrong here (something not so
>> obvious) and not a bug...
>>
>> What I'm trying to say is, when you send sequences using
>> remoteblast like, this you are essentially spamming the NCBI
>> BLAST server with ~1600 requests. This script wasn't set up with
>> that intent in mind; you should really try to set up your own
>> local blast database if possible. If you can't, try running this
>> script in off-hours (10pm-6am EST or something like that).
>>
>>
>> Chris
>>
>> On Jun 2, 2006, at 7:49 PM, Hubert Prielinger wrote:
>>
>>
>>> hi,
>>> input database: swissprot
>>> matrix: pam30
>>> count: 1
>>> gapcosts: 9 1
>>>
>>> I know that there are a lot of sequences, but that doesn't
>>> matter, you can delete all of them except one, the amount of the
>>> sequences is not the problem, the script reads one line and
>>> submits it.....then the second line and so on.....I have tried
>>> it with only one sequence either and I got the same result....
>>> the script run at that time for more than 20
>>> minutes!!!!!! .....and that should be enough time to retrieve
>>> the results for ONE sequence, I guess
>>>
>>> regards
>>> Hubert
>>>
>>>
>>>
>>> Chris Fields wrote:
>>>
>>>> You need to add the input conditions as well (you have several
>>>> <STDIN> lines which may play a role; I would like to know what
>>>> you normally enter for those).
>>>>
>>>> How long did you let the script run? I ran a quick check on
>>>> your sequences; you have almost 1600, so you have to expect
>>>> that you'll run into some problems here! Most here (including
>>>> me) would suggest you try installing a local blast setup for
>>>> something like this.
>>>>
>>>> Chris
>>>>
>>>> On Jun 2, 2006, at 6:19 PM, Hubert Prielinger wrote:
>>>>
>>>>
>>>>> hi,
>>>>> I have submitted the bug -> Bug 2017
>>>>> with the script and input file, just start it from command line
>>>>>
>>>>> thank you very much
>>>>> greetings
>>>>>
>>>>> Hubert
>>>>>
>>>>> Chris Fields wrote:
>>>>>
>>>>>> Hubert,
>>>>>>
>>>>>> I have a script that's using blastxml and XML output which
>>>>>> seems to work.
>>>>>> I'll try looking at it to get a better idea this weekend.
>>>>>>
>>>>>> Chris
>>>>>>
>>>>>>
>>>>>>
>>>>>>> -----Original Message-----
>>>>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>>>>>>> bounces at lists.open-bio.org] On Behalf Of Hubert Prielinger
>>>>>>> Sent: Friday, June 02, 2006 4:12 PM
>>>>>>> To: Chris Fields; bioperl-l at bioperl.org; Chris Fields;
>>>>>>> 'Sendu Bala'
>>>>>>> Subject: Re: [Bioperl-l] remoteblast xml problem
>>>>>>>
>>>>>>> hi,
>>>>>>> sorry, but I have updated the remoteblast module and I have
>>>>>>> run several
>>>>>>> attempts with the same results as before. It didn't work.
>>>>>>> I didn't get any results.
>>>>>>>
>>>>>>> regards
>>>>>>> Hubert
>>>>>>>
>>>>>>>
>>>>>>> Chris Fields wrote:
>>>>>>>
>>>>>>>
>>>>>>>> Sendu, Hubert,
>>>>>>>>
>>>>>>>>
>>>>>>>> Hubert, your code looks fine so Sendu's patch should fix
>>>>>>>> the problem
>>>>>>>>
>>>>>>>>
>>>>>>> (break
>>>>>>>
>>>>>>>
>>>>>>>> out of that infinite loop). I applied Sendu's patch to
>>>>>>>> RemoteBlast in
>>>>>>>>
>>>>>>>>
>>>>>>> CVS;
>>>>>>>
>>>>>>>
>>>>>>>> it passed all tests in RemoteBlast.t. Try updating from
>>>>>>>> CVS to see if
>>>>>>>>
>>>>>>>>
>>>>>>> it
>>>>>>>
>>>>>>>
>>>>>>>> works.
>>>>>>>>
>>>>>>>> Chris
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>> -----Original Message-----
>>>>>>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>>>>>>>>> bounces at lists.open-bio.org] On Behalf Of Sendu Bala
>>>>>>>>> Sent: Friday, June 02, 2006 4:04 AM
>>>>>>>>> To: bioperl-l at lists.open-bio.org
>>>>>>>>> Subject: Re: [Bioperl-l] remoteblast xml problem
>>>>>>>>>
>>>>>>>>> Hubert Prielinger wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> hi,
>>>>>>>>>> I have the following program and it worked quite well,
>>>>>>>>>> for retrieving
>>>>>>>>>> remoteblast results in a textfile,
>>>>>>>>>> now I have altered it to to xml, and it didn't work
>>>>>>>>>> anymore.....
>>>>>>>>>> it takes all the parameter at the commandline, submits
>>>>>>>>>> the query, but
>>>>>>>>>>
>>>>>>>>>>
>>>>>>> I
>>>>>>>
>>>>>>>
>>>>>>>>>> don't retrieve any results file anymore.....
>>>>>>>>>>
>>>>>>>>>> it seems that it hangs in a endless loop......
>>>>>>>>>> the only output I get is: $rc is not a ref! over and
>>>>>>>>>> over..... it
>>>>>>>>>> doesn't enter the else term anymore....
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>> There is no problem with your code. The problem is with
>>>>>>>>> the NCBI server
>>>>>>>>> and should be reported to them. You can visit the site and
>>>>>>>>> do a blast,
>>>>>>>>> requesting xml format, and you will typically get one
>>>>>>>>> normal 'waiting'
>>>>>>>>> message and the promise that it will be updated in x
>>>>>>>>> seconds, but
>>>>>>>>> subsequent attempts to get progress information result in
>>>>>>>>> an xml error
>>>>>>>>> page because the NCBI server doesn't actually send any data.
>>>>>>>>>
>>>>>>>>> Unfortunately the way that the bioperl code is written, it
>>>>>>>>> treats no
>>>>>>>>> data as 'waiting' instead of an error. I've offered a
>>>>>>>>> patch to fix this
>>>>>>>>> at this bug page:
>>>>>>>>> http://bugzilla.bioperl.org/show_bug.cgi?id=2015
>>>>>>>>> _______________________________________________
>>>>>>>>> Bioperl-l mailing list
>>>>>>>>> Bioperl-l at lists.open-bio.org
>>>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> Bioperl-l mailing list
>>>>>>>> Bioperl-l at lists.open-bio.org
>>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Bioperl-l mailing list
>>>>>>> Bioperl-l at lists.open-bio.org
>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>> _______________________________________________
>>>>> Bioperl-l mailing list
>>>>> Bioperl-l at lists.open-bio.org
>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>
>>>> Christopher Fields
>>>> Postdoctoral Researcher
>>>> Lab of Dr. Robert Switzer
>>>> Dept of Biochemistry
>>>> University of Illinois Urbana-Champaign
>>>>
>>>>
>>>>
>>>>
>>>>
>>
>> Christopher Fields
>> Postdoctoral Researcher
>> Lab of Dr. Robert Switzer
>> Dept of Biochemistry
>> University of Illinois Urbana-Champaign
>>
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>>
>>
>
Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign
More information about the Bioperl-l
mailing list