[Bioperl-l] How to extract organism information from blast report?

Stefano Ghignone ste.ghi at libero.it
Tue Jul 1 22:41:13 EDT 2003


Hi!
This code retrieves the binomial name of the organism from a GenBank format sequence by its accession no.

use Bio::Seq;
use Bio::SeqIO;
use Bio::DB::GenBank;
my $accnum = $ARGV[0];
my @args = (-retrivaltype => 'tempfile', -format => 'GenBank');
my $gb = Bio::DB::GenBank->new(@args);
my $seqio = $gb->get_Stream_by_acc($accnum);
while( my $orgn = $seqio->next_seq() ) {
    my $species = $orgn->species();
    printf "\t%s\n", $species->binomial('FULL');}

You can use it as a subrutine in a major script (ex. "Running Remote Blast" in Pasteur Institute Bioperl Course, Exercise 4.4) to which you can pass $hit->accession object from a blast report as the argument, and then manage the object $species->binomial('FULL').
The problem is that the code was functioning only with bioperl v. 1.0 (as I reported in #1460 bug report), and not with the current 1.2.1 distribution.
I hope the problem will be resolved with the next release.
Ciao!
Stefano Ghignone





>Unfortunately, [ORGANISM] information has been truncated from return of
>$hit->description. So I can't use the regexp $hit->description to parse
>out organism name.
>
>Ying Lin
>Dept. of Computer and Information Sciences
>University of Delaware
>
>On Tue, 1 Jul 2003, Jason Stajich wrote:
> locus and accession are parsed out. We've tried not to assume that every
> report is for a NCBI formatted database.  I suppose we can add in an
> organism field when we recognize the header line.  (one could probably do
> this too for Bio::Seq and species when we add a NCBIFasta parser at some
> point).
>
> At any rate I assume this regexp will get you what you want:
> my ($org) = ($hit->description =~ /\[([^\]]+)\]/ );
>
> if( $org ) {
>  # do something with the org name
> }
>
> -jason
> On Tue, 1 Jul 2003, ying lin wrote:
>
> > I thought that as methods to extract name and description as well as a
> > bunch of other hit information have been provided, there should be some
> > method to extract hit organism.
> >
> > Below is one example of hsp which has the [ORGANISM] information after
> > name and description excerpted from blast report:
> >
> > >gi|3348131|gb|AAC27796.1|   cytoplasmic beta actin [Xenopus laevis]
> >  gi|27735427|gb|AAH41203.1|   similar to actin, beta, cytoplasmic [Xenopus
> > laevis]
> >           Length = 375
> >
> >  Score =  754 bits (1947), Expect = 0.0
> >  Identities = 371/375 (98%), Positives = 375/375 (100%)
> >  Frame = +2
> >
> > Query: 89   MEEEIAALVIDNGSGMCKAGFAGDDAPRAVFPSIVGRPRHQGVMVGMGQKDSYVGDEAQS
> > 268
> >             ME++IAALV+DNGSGMCKAGFAGDDAPRAVFPSIVGRPRHQGVMVGMGQKDSYVGDEAQS
> > Sbjct: 1    MEDDIAALVVDNGSGMCKAGFAGDDAPRAVFPSIVGRPRHQGVMVGMGQKDSYVGDEAQS
> > 60
> >
> > Query: 269  KRGILTLKYPIEHGIVTNWDDMEKIWHHTFYNELRVAPEEHPVLLTEAPLNPKANREKMT
> > 448
> >             KRGILTLKYPIEHGIVTNWDDMEKIWHHTFYNELRVAPEEHPVLLTEAPLNPKANREKMT
> > Sbjct: 61   KRGILTLKYPIEHGIVTNWDDMEKIWHHTFYNELRVAPEEHPVLLTEAPLNPKANREKMT
> > 120
> >
> > Query: 449  QIMFETFNTPAMYVAIQAVLSLYASGRTTGIVMDSGDGVTHTVPIYEGYALPHAILRLDL
> > 628
> >             QIMFETFNTPAMYVAIQAVLSLYASGRTTGIVMDSGDGVTHTVPIYEGYALPHAILRLDL
> > Sbjct: 121  QIMFETFNTPAMYVAIQAVLSLYASGRTTGIVMDSGDGVTHTVPIYEGYALPHAILRLDL
> > 180
> >
> > Query: 629  AGRDLTDYLMKILTERGYSFTTTAEREIVRDIKEKLCYVALDFEQEMATAASSSSLEKSY
> > 808
> >             AGRDLTDYLMKILTERGYSFTTTAEREIVRDIKEKLCYVALDFEQEMATAASSSSLEKSY
> > Sbjct: 181  AGRDLTDYLMKILTERGYSFTTTAEREIVRDIKEKLCYVALDFEQEMATAASSSSLEKSY
> > 240
> >
> > Query: 809  ELPDGQVITIGNERFRCPEALFQPSFLGMESCGIHETTFNSIMKCDVDIRKDLYANTVLS
> > 988
> >             ELPDGQVITIGNERFRCPEALFQPSFLGMESCGIHETT+NSIMKCDVDIRKDLYANTVLS
> > Sbjct: 241  ELPDGQVITIGNERFRCPEALFQPSFLGMESCGIHETTYNSIMKCDVDIRKDLYANTVLS
> > 300
> >
> > Query: 989  GGTTMYPGIADRMQKEITALAPSTMKIKIIAPPERKYSVWIGGSILASLSTFQQMWISKQ
> > 1168
> >             GGTTMYPGIADRMQKEITALAPSTMKIKIIAPPERKYSVWIGGSILASLSTFQQMWISKQ
> > Sbjct: 301  GGTTMYPGIADRMQKEITALAPSTMKIKIIAPPERKYSVWIGGSILASLSTFQQMWISKQ
> > 360
> >
> > Query: 1169 EYDESGPSIVHRKCF 1213
> >             EYDESGPSIVHRKCF
> > Sbjct: 361  EYDESGPSIVHRKCF 375
> >
> >
> >
> >
> >
> > Ying Lin
> > Dept. of Computer and Information Sciences
> > University of Delaware
> >
> > On Tue, 1 Jul 2003, Jason Stajich wrote:
> >
> > > Is the organism's name actually in the report?
> > >
> > > With SearchIO,
> > > $hit->name, $hit->description will give you whatever is in the report.
> > >
> > > -jason
> > > On Tue, 1 Jul 2003, ying lin wrote:
> > >
> > > > Hi,I have been trying to find the method to extract the organism's name of
> > > > hits in blast report but can't find it.Can anyone tell me what is
> > > > that?Thanks!
> > > >
> > > > Ying Lin
> > > > Dept. of Computer and Information Sciences
> > > > University of Delaware
> > > >
> > > > _______________________________________________
> > > > Bioperl-l mailing list
> > > > Bioperl-l at portal.open-bio.org
> > > > http://portal.open-bio.org/mailman/listinfo/bioperl-l
> > > >
> > >
> > > --
> > > Jason Stajich
> > > Duke University
> > > jason at cgt.mc.duke.edu
> > >
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at portal.open-bio.org
> > http://portal.open-bio.org/mailman/listinfo/bioperl-l
> >
>
> --
> Jason Stajich
> Duke University
> jason at cgt.mc.duke.edu
>


More information about the Bioperl-l mailing list