[Bioperl-l] bioperl 1.4 SearchIO doesn't work parsing Blastoutput
Chris Fields
cjfields at uiuc.edu
Thu Feb 9 05:07:15 UTC 2006
On Feb 8, 2006, at 6:54 PM, Joel Steele wrote:
> Greetings,
> Im not well versed in Bio::SearchIO but there are a few comments
> about your
> code that may or may not be relevant...
>
> first thing:
>
> =-=-=-=-=code snippet=-=-=-=-=
>
> #!/usr/bin/perl -w
> use strict; #save yourself the headaches and force yourself to
> write clean
> code.
>
> =-=-=-=-=code snippet=-=-=-=-=
>
Tread very carefully here. Just about every book on perl suggests
'use strict' and adding warnings for code development (ex. the Camel,
the Llama, and others); in fact, these are the very books most
beginners start from. Some would consider NOT using -w or 'use
strict' a bad habit; everybody has an opinion (I would repeat an oft-
heard Texas saying, but I'll refrain). Just remember: try to be a
little more constructive in your critique and insert a little less
about your personal coding style. If you hit the wrong person, you
might get flamed.
Here's a link that may help a bit here:
http://bioperl.org/Core/Latest/
biodesign.html#respect_people_s_code__in_particular_if_it_works_
> next thing:
> when you are reading the files from the directory you are not doing
> any sort
> of filtering as to what is returned. If you are on a Unix flavored
> system
> you may be getting the '.' and '..' entries from your readdir(DIR)
> call. I
> would suggest placing a grep in there somewhere to get only blast
> files.
> something like:
>
I agree here. You could probably also use something like File::Find
here to make things a bit easier with the file names as well; works
wonderfully, esp. when traversing a directory tree.
> =-=-=-=-=code snippet=-=-=-=-=
>
> #assuming the file extension for blast files is .bls
> #the -e and -f are filetests; you could probably get away with just
> #-f. Here is a link for reference on the filetests available in Perl.
> #
> # http://www.perlmonks.org/?node_id=370
>
> my @files_to_parse = grep{/\w+\.bls/ && -e && -f} readdir(DIR);
> closedir(DIR);
>
> #then proceed with your foreach but over @files_to_parse
>
> foreach my $file(@files_to_parse){
> #do cool stuff here...
> }
>
Again, agreed. But, does it really solve the main problem, which is
an issue with SearchIO::blast? It seemed to try parsing a blast file...
> =-=-=-=-=code snippet=-=-=-=-=
>
> Hope that helps.
> -Joel Steele
>
>
> "The surest way to corrupt a youth is to instruct him to hold in
> higher
> regard those who think alike than those who think differently." -
> Nietzsche
>
> "I do not feel obliged to believe that the same God who endowed us
> with
> sense, reason and intellect has intended us to forego their use." -
> Galileo
>
>
>
>
>> From: Hubert Prielinger <hubert.prielinger at gmx.at>
>> To: Chris Fields <cjfields at uiuc.edu>, bioperl-l at bioperl.org,
>> rahall2 at ualr.edu
>> Subject: Re: [Bioperl-l] bioperl 1.4 SearchIO doesn't work parsing
>> Blastoutput
>> Date: Wed, 08 Feb 2006 16:22:44 -0600
>> MIME-Version: 1.0
>> Received: from newportal.open-bio.org ([209.59.5.172]) by
>> bay0-mc11-f17.bay0.hotmail.com with Microsoft SMTPSVC
>> (6.0.3790.211); Wed, 8
>> Feb 2006 15:21:55 -0800
>> Received: from newportal.open-bio.org (localhost.localdomain
>> [127.0.0.1])by
>> newportal.open-bio.org (8.13.1/8.13.1) with ESMTP id
>> k18NKjCX009295;Wed, 8
>> Feb 2006 18:20:53 -0500
>> Received: from mail.gmx.net (mail.gmx.net [213.165.64.21])by
>> newportal.open-bio.org (8.13.1/8.13.1) with SMTP id k18NKhS5009289for
>> <bioperl-l at bioperl.org>; Wed, 8 Feb 2006 18:20:43 -0500
>> Received: (qmail invoked by alias); 08 Feb 2006 23:19:21 -0000
>> Received: from ppc7.bio.ucalgary.ca (EHLO [136.159.234.7])
>> [136.159.234.7]by mail.gmx.net (mp020) with SMTP; 09 Feb 2006
>> 00:19:21
>> +0100
>> X-Message-Info: N4u0pqWW+O3IGnF2tRfvcViLTroM8CQX8qbJiCtgSIY=
>> X-Authenticated: #16854991
>> User-Agent: Mozilla Thunderbird 1.0.7-1.1.fc4 (X11/20050929)
>> X-Accept-Language: en-us, en
>> References: <001201c62d03$703178c0$15327e82 at pyrimidine>
>> X-Y-GMX-Trusted: 0
>> X-Greylist: Sender IP whitelisted, not delayed by milter-
>> greylist-2.0.2
>> (newportal.open-bio.org [127.0.0.1]); Wed, 08 Feb 2006 18:21:21
>> -0500 (EST)
>> X-Greylist: IP, sender and recipient auto-whitelisted, not delayed
>> bymilter-greylist-2.0.2 (newportal.open-bio.org
>> [207.154.17.70]);Wed, 08
>> Feb 2006 18:20:43 -0500 (EST)
>> X-Spam-Score: (0) X-Spam-Score: (-0.001) SPF_PASS
>> X-Scanned-By: MIMEDefang 2.52
>> X-Scanned-By: MIMEDefang 2.52 on 207.154.17.70
>> X-BeenThere: bioperl-l at lists.open-bio.org
>> X-Mailman-Version: 2.1.7
>> Precedence: list
>> List-Id: Bioperl Project Discussion List <bioperl-l.lists.open-
>> bio.org>
>> List-Unsubscribe:
>> <http://lists.open-bio.org/mailman/listinfo/bioperl-
>> l>,<mailto:bioperl-l-request at lists.open-bio.org?subject=unsubscribe>
>> List-Archive: <http://lists.open-bio.org/pipermail/bioperl-l>
>> List-Post: <mailto:bioperl-l at lists.open-bio.org>
>> List-Help: <mailto:bioperl-l-request at lists.open-bio.org?subject=help>
>> List-Subscribe:
>> <http://lists.open-bio.org/mailman/listinfo/bioperl-
>> l>,<mailto:bioperl-l-request at lists.open-bio.org?subject=subscribe>
>> Errors-To: bioperl-l-bounces at lists.open-bio.org
>> Return-Path: bioperl-l-bounces at lists.open-bio.org
>> X-OriginalArrivalTime: 08 Feb 2006 23:21:56.0754 (UTC)
>> FILETIME=[7419CF20:01C62D06]
>>
>> hi,
>> I have installed from the following page:
>> http://news.open-bio.org/archives/2005_10.html, the Core, Run and
>> Ext.
>> I'm using only the SearchIO without remoteblast module, because I
>> have
>> already all my Blast output files.
>> My operating system is fedora core 9.
>>
>> Code:
>>
>> #!/usr/bin/perl -w
>>
>> use Bio::SearchIO;
>>
>> print "start program\n";
>> my $directory =
>> "/home/Hubert/installed/eclipse/workspace/Database_Search/result_4";
>> opendir(DIR, $directory) || die("Cannot open directory");
>> print "opened directory\n";
>>
>> foreach my $file (readdir(DIR)) {
>> print "read file\n";
>>
>> my $search = new Bio::SearchIO (-format => 'blast',
>> -file => $file);
>>
>> my $cutoff_len = 10;
>>
>>
>>
>> #iterate over each query sequence
>> while (my $result = $search->next_result) {
>> print "entered 1st while loop\n";
>>
>> #iterate over each hit on the query sequence
>> while (my $hit = $result->next_hit) {
>>
>> #iterate over each HSP in the hit
>> while (my $hsp = $hit->next_hsp) {
>>
>> if ($hsp->length('sbjct') <= $cutoff_len) {
>> #print $hsp->hit_string, "\n";
>> for ($hsp->hit_string) {
>>
>>
>> if (tr/K// >= 2 || tr/R// >= 2 && tr/W// >= 2 ||
>> tr/K// == 1 && tr/R// == 1 && tr/W// >= 2) {
>>
>> # Print some tab-delimited data about this
>> HSP
>>
>> open (bigShot,
>> ">>BlastOutputTrial.txt") ||
>> die ("Could not open file. $!");
>> #print $result->query_name, "\t";
>>
>> # print $hit->significance, "\t";
>> print bigShot $hit->name, "-->";
>> print bigShot $hit->description, "\n";
>> #print bigShot "Query: ",
>> $hsp->start('query'), " ", $hsp->query_string, " ",
>> $hsp->end('query'), "\n";
>> print bigShot "Seq: ", $hsp->start
>> ('hit'),
>> " ", $hsp->hit_string, " ", $hsp->end('hit'), "\n";
>>
>> # print $hsp->rank, "\t";
>> # print $hsp->percent_identity, "\t";
>> # print $hsp->evalue, "\t";
>> # print $hsp->hsp_length, "\n";
>>
>> close (bigShot);
>>
>> };
>>
>>
>> }
>> }
>> }
>> }
>> }
>>
>> }
>>
>> closedir(DIR);
>>
>>
>> Chris Fields wrote:
>>
>>> Make sure you ran a full installation of bioperl-1.5.1 or bioperl-
>>> live
>> (not
>>> just the modules you want; mixing bioperl versions might work,
>>> but you
>> might
>>> run into interoperability problems). Then replace the
>> Bio::SearchIO::blast
>>> with the one in Bugzilla. The 'other option' you mentioned might be
>> trying
>>> XML instead of text, which is more stable in the long run. You will
>> still
>>> need to run a full upgrade to bioperl 1.5.1 for that; make sure
>>> you read
>>> this:
>>>
>>> http://bioperl.org/wiki/Module:Bio::Tools::Run::RemoteBlast
>>>
>>> If you're using SearchIO directly instead of Remoteblast, you
>>> should be
>> able
>>> to set the '-readmethod' flag to 'blastxml'.
>>>
>>> It also wouldn't hurt to know what OS you're using or see some code.
>> Roger
>>> is out there somewhere (I think) and may also have some input.
>>>
>>> Christopher Fields
>>> Postdoctoral Researcher - Switzer Lab
>>> Dept. of Biochemistry
>>> University of Illinois Urbana-Champaign
>>>
>>>
>>>
>>>> -----Original Message-----
>>>> From: Hubert Prielinger [mailto:hubert.prielinger at gmx.at]
>>>> Sent: Wednesday, February 08, 2006 3:41 PM
>>>> To: Chris Fields; bioperl-l at bioperl.org
>>>> Subject: Re: [Bioperl-l] bioperl 1.4 SearchIO doesn't work
>>>> parsing Blast output
>>>>
>>>> hi chris,
>>>> thanks, I have upgraded to version 1.5.1 but it isn't still
>>>> working, do you have any ohter idea, the problem I have is
>>>> that I have to parse a lot of textfiles....
>>>> or shall I look for another option to parse those files...
>>>>
>>>> regards
>>>> Hubert
>>>>
>>>>
>>>>
>>>> Chris Fields wrote:
>>>>
>>>>
>>>>
>>>>> My guess is you're running into text parsing problems in
>>>>> Bio::SearchIO::blast. Upgrade to the latest developer
>>>>>
>>>>>
>>>> version (1.5.1)
>>>>
>>>>
>>>>> or bioperl-live (CVS), then see the bug below.
>>>>>
>>>>> http://bugzilla.bioperl.org/show_bug.cgi?id=1934
>>>>>
>>>>> I think the first problem you ran into is solved in bioperl
>>>>>
>>>>>
>>>> 1.5.1, the
>>>>
>>>>
>>>>> last problem (more recent, not related to the first) has
>>>>>
>>>>>
>>>> been fixed but
>>>>
>>>>
>>>>> hasn't been committed to bioperl-live yet. The fixed
>>>>>
>>>>>
>>>> SearchIO::blast
>>>>
>>>>
>>>>> is available in the link above, but realize it hasn't been
>>>>>
>>>>>
>>>> committed yet and may change.
>>>>
>>>>
>>>>> Christopher Fields
>>>>> Postdoctoral Researcher - Switzer Lab
>>>>> Dept. of Biochemistry
>>>>> University of Illinois Urbana-Champaign
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>> -----Original Message-----
>>>>>> From: bioperl-l-bounces at lists.open-bio.org
>>>>>> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Hubert
>>>>>> Prielinger
>>>>>> Sent: Wednesday, February 08, 2006 2:52 PM
>>>>>> To: bioperl-l at bioperl.org
>>>>>> Subject: [Bioperl-l] bioperl 1.4 SearchIO doesn't work
>>>>>>
>>>>>>
>>>> parsing Blast
>>>>
>>>>
>>>>>> output
>>>>>>
>>>>>> Hi,
>>>>>> If I want to parse a Blast Output (Version 2.2.12) with
>>>>>>
>>>>>>
>>>> Bio::SearchIO,
>>>>
>>>>
>>>>>> I get the following error message:
>>>>>>
>>>>>> MSG: no data for midline Query 1 WWWKWRW 7
>>>>>> STACK Bio::SearchIO::blast::next_result
>>>>>> /usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/blast.pm:1151
>>>>>> STACK toplevel
>>>>>> /home/Hubert/installed/eclipse/workspace/Database_Search/
>>>>>> Blast.pl:21
>>>>>>
>>>>>> is that a bug......
>>>>>>
>>>>>> If I want to parse Blast Output (version 2.2.13), I don't get
>>>>>> anything.....
>>>>>> I'm using bioperl 1.4
>>>>>>
>>>>>> before, I have installed bioperl 1.4, it worked fine parsing
>>>>>> Blast
>>>>>> Output (version 2.2.12), but I don't remember which bioperl
>>>>>>
>>>>>>
>>>> version I
>>>>
>>>>
>>>>>> had installed
>>>>>>
>>>>>> thanks in advance
>>>>>>
>>>>>> Hubert
>>>>>>
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> Bioperl-l mailing list
>>>>>> Bioperl-l at lists.open-bio.org
>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>
>>>
>>>
>>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign
More information about the Bioperl-l
mailing list