[Bioperl-l] How to extract organism information from blast report?

ying lin ylin at mail.eecis.udel.edu
Tue Jul 1 13:42:57 EDT 2003


Unfortunately, [ORGANISM] information has been truncated from return of
$hit->description. So I can't use the regexp $hit->description to parse
out organism name.

Ying Lin
Dept. of Computer and Information Sciences
University of Delaware

On Tue, 1 Jul 2003, Jason Stajich wrote:

> locus and accession are parsed out. We've tried not to assume that every
> report is for a NCBI formatted database.  I suppose we can add in an
> organism field when we recognize the header line.  (one could probably do
> this too for Bio::Seq and species when we add a NCBIFasta parser at some
> point).
>
> At any rate I assume this regexp will get you what you want:
> my ($org) = ($hit->description =~ /\[([^\]]+)\]/ );
>
> if( $org ) {
>  # do something with the org name
> }
>
> -jason
> On Tue, 1 Jul 2003, ying lin wrote:
>
> > I thought that as methods to extract name and description as well as a
> > bunch of other hit information have been provided, there should be some
> > method to extract hit organism.
> >
> > Below is one example of hsp which has the [ORGANISM] information after
> > name and description excerpted from blast report:
> >
> > >gi|3348131|gb|AAC27796.1|   cytoplasmic beta actin [Xenopus laevis]
> >  gi|27735427|gb|AAH41203.1|   similar to actin, beta, cytoplasmic [Xenopus
> > laevis]
> >           Length = 375
> >
> >  Score =  754 bits (1947), Expect = 0.0
> >  Identities = 371/375 (98%), Positives = 375/375 (100%)
> >  Frame = +2
> >
> > Query: 89   MEEEIAALVIDNGSGMCKAGFAGDDAPRAVFPSIVGRPRHQGVMVGMGQKDSYVGDEAQS
> > 268
> >             ME++IAALV+DNGSGMCKAGFAGDDAPRAVFPSIVGRPRHQGVMVGMGQKDSYVGDEAQS
> > Sbjct: 1    MEDDIAALVVDNGSGMCKAGFAGDDAPRAVFPSIVGRPRHQGVMVGMGQKDSYVGDEAQS
> > 60
> >
> > Query: 269  KRGILTLKYPIEHGIVTNWDDMEKIWHHTFYNELRVAPEEHPVLLTEAPLNPKANREKMT
> > 448
> >             KRGILTLKYPIEHGIVTNWDDMEKIWHHTFYNELRVAPEEHPVLLTEAPLNPKANREKMT
> > Sbjct: 61   KRGILTLKYPIEHGIVTNWDDMEKIWHHTFYNELRVAPEEHPVLLTEAPLNPKANREKMT
> > 120
> >
> > Query: 449  QIMFETFNTPAMYVAIQAVLSLYASGRTTGIVMDSGDGVTHTVPIYEGYALPHAILRLDL
> > 628
> >             QIMFETFNTPAMYVAIQAVLSLYASGRTTGIVMDSGDGVTHTVPIYEGYALPHAILRLDL
> > Sbjct: 121  QIMFETFNTPAMYVAIQAVLSLYASGRTTGIVMDSGDGVTHTVPIYEGYALPHAILRLDL
> > 180
> >
> > Query: 629  AGRDLTDYLMKILTERGYSFTTTAEREIVRDIKEKLCYVALDFEQEMATAASSSSLEKSY
> > 808
> >             AGRDLTDYLMKILTERGYSFTTTAEREIVRDIKEKLCYVALDFEQEMATAASSSSLEKSY
> > Sbjct: 181  AGRDLTDYLMKILTERGYSFTTTAEREIVRDIKEKLCYVALDFEQEMATAASSSSLEKSY
> > 240
> >
> > Query: 809  ELPDGQVITIGNERFRCPEALFQPSFLGMESCGIHETTFNSIMKCDVDIRKDLYANTVLS
> > 988
> >             ELPDGQVITIGNERFRCPEALFQPSFLGMESCGIHETT+NSIMKCDVDIRKDLYANTVLS
> > Sbjct: 241  ELPDGQVITIGNERFRCPEALFQPSFLGMESCGIHETTYNSIMKCDVDIRKDLYANTVLS
> > 300
> >
> > Query: 989  GGTTMYPGIADRMQKEITALAPSTMKIKIIAPPERKYSVWIGGSILASLSTFQQMWISKQ
> > 1168
> >             GGTTMYPGIADRMQKEITALAPSTMKIKIIAPPERKYSVWIGGSILASLSTFQQMWISKQ
> > Sbjct: 301  GGTTMYPGIADRMQKEITALAPSTMKIKIIAPPERKYSVWIGGSILASLSTFQQMWISKQ
> > 360
> >
> > Query: 1169 EYDESGPSIVHRKCF 1213
> >             EYDESGPSIVHRKCF
> > Sbjct: 361  EYDESGPSIVHRKCF 375
> >
> >
> >
> >
> >
> > Ying Lin
> > Dept. of Computer and Information Sciences
> > University of Delaware
> >
> > On Tue, 1 Jul 2003, Jason Stajich wrote:
> >
> > > Is the organism's name actually in the report?
> > >
> > > With SearchIO,
> > > $hit->name, $hit->description will give you whatever is in the report.
> > >
> > > -jason
> > > On Tue, 1 Jul 2003, ying lin wrote:
> > >
> > > > Hi,I have been trying to find the method to extract the organism's name of
> > > > hits in blast report but can't find it.Can anyone tell me what is
> > > > that?Thanks!
> > > >
> > > > Ying Lin
> > > > Dept. of Computer and Information Sciences
> > > > University of Delaware
> > > >
> > > > _______________________________________________
> > > > Bioperl-l mailing list
> > > > Bioperl-l at portal.open-bio.org
> > > > http://portal.open-bio.org/mailman/listinfo/bioperl-l
> > > >
> > >
> > > --
> > > Jason Stajich
> > > Duke University
> > > jason at cgt.mc.duke.edu
> > >
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at portal.open-bio.org
> > http://portal.open-bio.org/mailman/listinfo/bioperl-l
> >
>
> --
> Jason Stajich
> Duke University
> jason at cgt.mc.duke.edu
>



More information about the Bioperl-l mailing list