[Bioperl-l] How to extract organism information from blast report?

Jason Stajich jason at cgt.duhs.duke.edu
Tue Jul 1 13:38:53 EDT 2003


locus and accession are parsed out. We've tried not to assume that every
report is for a NCBI formatted database.  I suppose we can add in an
organism field when we recognize the header line.  (one could probably do
this too for Bio::Seq and species when we add a NCBIFasta parser at some
point).

At any rate I assume this regexp will get you what you want:
my ($org) = ($hit->description =~ /\[([^\]]+)\]/ );

if( $org ) {
 # do something with the org name
}

-jason
On Tue, 1 Jul 2003, ying lin wrote:

> I thought that as methods to extract name and description as well as a
> bunch of other hit information have been provided, there should be some
> method to extract hit organism.
>
> Below is one example of hsp which has the [ORGANISM] information after
> name and description excerpted from blast report:
>
> >gi|3348131|gb|AAC27796.1|   cytoplasmic beta actin [Xenopus laevis]
>  gi|27735427|gb|AAH41203.1|   similar to actin, beta, cytoplasmic [Xenopus
> laevis]
>           Length = 375
>
>  Score =  754 bits (1947), Expect = 0.0
>  Identities = 371/375 (98%), Positives = 375/375 (100%)
>  Frame = +2
>
> Query: 89   MEEEIAALVIDNGSGMCKAGFAGDDAPRAVFPSIVGRPRHQGVMVGMGQKDSYVGDEAQS
> 268
>             ME++IAALV+DNGSGMCKAGFAGDDAPRAVFPSIVGRPRHQGVMVGMGQKDSYVGDEAQS
> Sbjct: 1    MEDDIAALVVDNGSGMCKAGFAGDDAPRAVFPSIVGRPRHQGVMVGMGQKDSYVGDEAQS
> 60
>
> Query: 269  KRGILTLKYPIEHGIVTNWDDMEKIWHHTFYNELRVAPEEHPVLLTEAPLNPKANREKMT
> 448
>             KRGILTLKYPIEHGIVTNWDDMEKIWHHTFYNELRVAPEEHPVLLTEAPLNPKANREKMT
> Sbjct: 61   KRGILTLKYPIEHGIVTNWDDMEKIWHHTFYNELRVAPEEHPVLLTEAPLNPKANREKMT
> 120
>
> Query: 449  QIMFETFNTPAMYVAIQAVLSLYASGRTTGIVMDSGDGVTHTVPIYEGYALPHAILRLDL
> 628
>             QIMFETFNTPAMYVAIQAVLSLYASGRTTGIVMDSGDGVTHTVPIYEGYALPHAILRLDL
> Sbjct: 121  QIMFETFNTPAMYVAIQAVLSLYASGRTTGIVMDSGDGVTHTVPIYEGYALPHAILRLDL
> 180
>
> Query: 629  AGRDLTDYLMKILTERGYSFTTTAEREIVRDIKEKLCYVALDFEQEMATAASSSSLEKSY
> 808
>             AGRDLTDYLMKILTERGYSFTTTAEREIVRDIKEKLCYVALDFEQEMATAASSSSLEKSY
> Sbjct: 181  AGRDLTDYLMKILTERGYSFTTTAEREIVRDIKEKLCYVALDFEQEMATAASSSSLEKSY
> 240
>
> Query: 809  ELPDGQVITIGNERFRCPEALFQPSFLGMESCGIHETTFNSIMKCDVDIRKDLYANTVLS
> 988
>             ELPDGQVITIGNERFRCPEALFQPSFLGMESCGIHETT+NSIMKCDVDIRKDLYANTVLS
> Sbjct: 241  ELPDGQVITIGNERFRCPEALFQPSFLGMESCGIHETTYNSIMKCDVDIRKDLYANTVLS
> 300
>
> Query: 989  GGTTMYPGIADRMQKEITALAPSTMKIKIIAPPERKYSVWIGGSILASLSTFQQMWISKQ
> 1168
>             GGTTMYPGIADRMQKEITALAPSTMKIKIIAPPERKYSVWIGGSILASLSTFQQMWISKQ
> Sbjct: 301  GGTTMYPGIADRMQKEITALAPSTMKIKIIAPPERKYSVWIGGSILASLSTFQQMWISKQ
> 360
>
> Query: 1169 EYDESGPSIVHRKCF 1213
>             EYDESGPSIVHRKCF
> Sbjct: 361  EYDESGPSIVHRKCF 375
>
>
>
>
>
> Ying Lin
> Dept. of Computer and Information Sciences
> University of Delaware
>
> On Tue, 1 Jul 2003, Jason Stajich wrote:
>
> > Is the organism's name actually in the report?
> >
> > With SearchIO,
> > $hit->name, $hit->description will give you whatever is in the report.
> >
> > -jason
> > On Tue, 1 Jul 2003, ying lin wrote:
> >
> > > Hi,I have been trying to find the method to extract the organism's name of
> > > hits in blast report but can't find it.Can anyone tell me what is
> > > that?Thanks!
> > >
> > > Ying Lin
> > > Dept. of Computer and Information Sciences
> > > University of Delaware
> > >
> > > _______________________________________________
> > > Bioperl-l mailing list
> > > Bioperl-l at portal.open-bio.org
> > > http://portal.open-bio.org/mailman/listinfo/bioperl-l
> > >
> >
> > --
> > Jason Stajich
> > Duke University
> > jason at cgt.mc.duke.edu
> >
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>

--
Jason Stajich
Duke University
jason at cgt.mc.duke.edu


More information about the Bioperl-l mailing list