[Bioperl-l] extracting GI number from BLAST hit
Jason Stajich
jason.stajich at duke.edu
Fri Sep 17 09:35:19 EDT 2004
Well is the GI number actually in the Hit in the report or in the
description down in the hsp?
We only report what is in the report - can you send a sample report
which has the gi number in it?
You may want to run your blast with -I T
-I Show GI's in deflines [T/F]
-jason
On Sep 16, 2004, at 11:55 AM, Joshua Orvis wrote:
> How can one extract the GI number from hits when doing BLAST against
> an NCBI-formatted BLAST database?
>
> Each entry in the original multi-FASTA file was like this:
>
>> gi|30260195|ref|NC_003997.3| Bacillus anthracis str. Ames, complete
>> genome
> [sequence .....]
>
> and formatting was done like:
>
> # formatdb -i filename.fna -p F -o T
>
> When I BLAST and parse the hit section I cannot see how to get the GI
> number out of each hit. This code:
>
> ## returns a Bio::SearchIO::blast object
> $report = $fact->blastall($seq);
>
> ## returns a Bio::Search::Result::BlastResult object
> while( my $result = $report->next_result ) {
>
> ## returns a Bio::Search::Hit::BlastHit object
> while( my $hit = $result->next_hit ) {
>
> my $acc = $hit->accession || 'NOACC';
> my $desc = $hit->description || 'NODESC';
> my $name = $hit->name || 'NONAME';
> my $locus = $hit->locus || 'NOLOC';
>
> print "$acc - $desc - $name - $locus\n";
>
> ## returns a Bio::Search::HSP::GenericHSP object
> while( my $hsp = $hit->next_hsp ) {
> ## TODO, grab the alignments in a bit
> }
> }
> }
>
> generates output like this:
>
> NC_002940 - Haemophilus ducreyi 35000HP, complete genome -
> ref|NC_002940.2| - NOLOC
> NC_004088 - Yersinia pestis KIM, complete genome - ref|NC_004088.1| -
> NOLOC
> NC_003143 - Yersinia pestis strain CO92, complete genome -
> ref|NC_003143.1| - NOLOC
> NC_002516 - Pseudomonas aeruginosa PA01, complete genome -
> ref|NC_002516.1| - NOLOC
> NC_002677 - Mycobacterium leprae strain TN complete genome -
> ref|NC_002677.1| - NOLOC
>
>
> I expected that I could parse it out of the description line, but that
> is being done at some stage before. I'm probably just missing a
> method somewhere in the docs. Any suggestions?
>
> Joshua
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>
>
--
Jason Stajich
jason.stajich at duke.edu
http://www.duke.edu/~jes12/
More information about the Bioperl-l
mailing list