[Bioperl-l] parsing blast HIT with multiple GIs

Jason Stajich jason@cgt.mc.duke.edu
Thu, 31 Oct 2002 09:29:58 -0500 (EST)


The following code works for me and returns the full description. Are you
using an old version of bioperl?

#!/usr/bin/perl -w
use strict;
use Bio::SearchIO;

my $in = new Bio::SearchIO(-file => shift
			   -format => 'blast');

while( my $r = $in->next_result ) {
    while ( my $hit = $r->next_hit ) {
	print $hit->name, " ", $hit->description, "\n\n";
    }
    last;
}

[report snippet]

>gi|19114466|ref|NP_593554.1| (NC_003424) hypothetical PHD type zinc
             finger protein wiht BAH domain [Schizosaccharomyces pombe]
 gi|1351695|sp|Q10077|YANC_SCHPO Hypothetical protein C3H1.12c in
             chromosome I
 gi|7491117|pir||T38744 hypothetical protein SPAC3H1.12c - fission yeast
              (Schizosaccharomyces pombe)
 gi|1103513|emb|CAA92265.1| (Z68144) hypothetical PHD type zinc finger
             protein wiht BAH domain [Schizosaccharomyces pombe]


[jason@sonogno test]$ perl blast_parse_fulldesc.pl matloc_nr.blastx
gi|19114466|ref|NP_593554.1| (NC_003424) hypothetical PHD type zinc finger
protein wiht BAH domain [Schizosaccharomyces pombe] gi|1351695|sp|Q10077|YANC_SCHPO
Hypothetical protein C3H1.12c in chromosome I gi|7491117|pir||T38744
hypothetical protein SPAC3H1.12c - fission yeast (Schizosaccharomyces pombe) gi|1103513|emb|CAA92265.1| (Z68144)
hypothetical PHD type zinc finger protein wiht BAH domain [Schizosaccharomyces pombe]



On Thu, 31 Oct 2002, suhoiy wrote:

> I am sorry I did not express myself clearly. For example, here is part of a hit:
>
> ==========
>
> >gi|9789726|sp|O55131|SEP7_MOUSE   Septin 7 (CDC10 protein homolog)
>  gi|2864606|emb|CAA11547.1|   (AJ223782) CDC10 [Mus musculus]
>           Length = 436
>
>  Score =  225 bits (574), Expect = 9e-58
>  Identities = 119/284 (41%), Positives = 184/284 (63%)
>  Frame = -1
>
> ==========
>
> $hit->description returned "Septin 7 (CDC10 protein homolog)",
> while I want " gi|2864606|emb|CAA11547.1|...".
>
Everything that is in not in the first (\S+) is considered the
description.

> I traced the source code,  those lines are discarded in next_result. :(
>
Not supposed to be - perhaps an old version?
>
> ---Original Message---
> From: Jason Stajich<jason@cgt.mc.duke.edu>
> Subject: Re: [Bioperl-l] parsing blast HIT with multiple GIs
>
> >they are in the $hit->description part.
> >
> >-jason
> >On Thu, 31 Oct 2002, suhoiy wrote:
> >
> >> Hello all,
> >>
> >> When I parse some blast results of proteins, I found some hits had
> >> multiple GIs, which are identical sequences. The SearchIO module
> >> discard the GIs without a leading >. Then how can I get all the GIs
> >> in a blast result? any suggestion?
> >>
> >> Many thanks! :)
> >>
> >> suhoiy
> >>
> >>
> >> _______________________________________________
> >> Bioperl-l mailing list
> >> Bioperl-l@bioperl.org
> >> http://bioperl.org/mailman/listinfo/bioperl-l
> >>
> >
> >--
> >Jason Stajich
> >Duke University
> >jason at cgt.mc.duke.edu
> >
>
>
> ---End of Message---
>

-- 
Jason Stajich
Duke University
jason at cgt.mc.duke.edu