[Bioperl-l] parsing blast HIT with multiple GIs

suhoiy suhoiy@21cn.com
Thu, 31 Oct 2002 23:22:19 GMT


Oh! I got it when I updated bioperl from 1.0 to 1.0.2!  
Now the next_result read these lines into description. :D

many thanks! 

suhoiy

---Original Message---
From: Jason Stajich<jason@cgt.mc.duke.edu>
Subject: Re: Re: [Bioperl-l] parsing blast HIT with multiple GIs
 
>The following code works for me and returns the full description. Are you
>using an old version of bioperl?
>
>#!/usr/bin/perl -w
>use strict;
>use Bio::SearchIO;
>
>my $in = new Bio::SearchIO(-file => shift
>			   -format => 'blast');
>
>while( my $r = $in->next_result ) {
>    while ( my $hit = $r->next_hit ) {
>	print $hit->name, " ", $hit->description, "\n\n";
>    }
>    last;
>}
>
>[report snippet]
>
>>gi|19114466|ref|NP_593554.1| (NC_003424) hypothetical PHD type zinc
>             finger protein wiht BAH domain [Schizosaccharomyces pombe]
> gi|1351695|sp|Q10077|YANC_SCHPO Hypothetical protein C3H1.12c in
>             chromosome I
> gi|7491117|pir||T38744 hypothetical protein SPAC3H1.12c - fission yeast
>              (Schizosaccharomyces pombe)
> gi|1103513|emb|CAA92265.1| (Z68144) hypothetical PHD type zinc finger
>             protein wiht BAH domain [Schizosaccharomyces pombe]
>
>
>[jason@sonogno test]$ perl blast_parse_fulldesc.pl matloc_nr.blastx
>gi|19114466|ref|NP_593554.1| (NC_003424) hypothetical PHD type zinc finger
>protein wiht BAH domain [Schizosaccharomyces pombe] gi|1351695|sp|Q10077|YANC_SCHPO
>Hypothetical protein C3H1.12c in chromosome I gi|7491117|pir||T38744
>hypothetical protein SPAC3H1.12c - fission yeast (Schizosaccharomyces pombe) gi|1103513|emb|CAA92265.1| (Z68144)
>hypothetical PHD type zinc finger protein wiht BAH domain [Schizosaccharomyces pombe]
>
>
>
>On Thu, 31 Oct 2002, suhoiy wrote:
>
>> I am sorry I did not express myself clearly. For example, here is part of a hit:
>>
>> ==========
>>
>> >gi|9789726|sp|O55131|SEP7_MOUSE   Septin 7 (CDC10 protein homolog)
>>  gi|2864606|emb|CAA11547.1|   (AJ223782) CDC10 [Mus musculus]
>>           Length = 436
>>
>>  Score =  225 bits (574), Expect = 9e-58
>>  Identities = 119/284 (41%), Positives = 184/284 (63%)
>>  Frame = -1
>>
>> ==========
>>
>> $hit->description returned "Septin 7 (CDC10 protein homolog)",
>> while I want " gi|2864606|emb|CAA11547.1|...".
>>
>Everything that is in not in the first (\S+) is considered the
>description.
>
>> I traced the source code,  those lines are discarded in next_result. :(
>>
>Not supposed to be - perhaps an old version?
>>
>> ---Original Message---
>> From: Jason Stajich<jason@cgt.mc.duke.edu>
>> Subject: Re: [Bioperl-l] parsing blast HIT with multiple GIs
>>
>> >they are in the $hit->description part.
>> >
>> >-jason
>> >On Thu, 31 Oct 2002, suhoiy wrote:
>> >
>> >> Hello all,
>> >>
>> >> When I parse some blast results of proteins, I found some hits had
>> >> multiple GIs, which are identical sequences. The SearchIO module
>> >> discard the GIs without a leading >. Then how can I get all the GIs
>> >> in a blast result? any suggestion?
>> >>
>> >> Many thanks! :)
>> >>
>> >> suhoiy
>> >>
>> >>
>> >> _______________________________________________
>> >> Bioperl-l mailing list
>> >> Bioperl-l@bioperl.org
>> >> http://bioperl.org/mailman/listinfo/bioperl-l
>> >>
>> >
>> >--
>> >Jason Stajich
>> >Duke University
>> >jason at cgt.mc.duke.edu
>> >
>>
>>
>> ---End of Message---
>>
>
>-- 
>Jason Stajich
>Duke University
>jason at cgt.mc.duke.edu
>