[Bioperl-l] How to extract organism information from blast report?
ying lin
ylin at mail.eecis.udel.edu
Tue Jul 1 13:42:57 EDT 2003
Unfortunately, [ORGANISM] information has been truncated from return of
$hit->description. So I can't use the regexp $hit->description to parse
out organism name.
Ying Lin
Dept. of Computer and Information Sciences
University of Delaware
On Tue, 1 Jul 2003, Jason Stajich wrote:
> locus and accession are parsed out. We've tried not to assume that every
> report is for a NCBI formatted database. I suppose we can add in an
> organism field when we recognize the header line. (one could probably do
> this too for Bio::Seq and species when we add a NCBIFasta parser at some
> point).
>
> At any rate I assume this regexp will get you what you want:
> my ($org) = ($hit->description =~ /\[([^\]]+)\]/ );
>
> if( $org ) {
> # do something with the org name
> }
>
> -jason
> On Tue, 1 Jul 2003, ying lin wrote:
>
> > I thought that as methods to extract name and description as well as a
> > bunch of other hit information have been provided, there should be some
> > method to extract hit organism.
> >
> > Below is one example of hsp which has the [ORGANISM] information after
> > name and description excerpted from blast report:
> >
> > >gi|3348131|gb|AAC27796.1| cytoplasmic beta actin [Xenopus laevis]
> > gi|27735427|gb|AAH41203.1| similar to actin, beta, cytoplasmic [Xenopus
> > laevis]
> > Length = 375
> >
> > Score = 754 bits (1947), Expect = 0.0
> > Identities = 371/375 (98%), Positives = 375/375 (100%)
> > Frame = +2
> >
> > Query: 89 MEEEIAALVIDNGSGMCKAGFAGDDAPRAVFPSIVGRPRHQGVMVGMGQKDSYVGDEAQS
> > 268
> > ME++IAALV+DNGSGMCKAGFAGDDAPRAVFPSIVGRPRHQGVMVGMGQKDSYVGDEAQS
> > Sbjct: 1 MEDDIAALVVDNGSGMCKAGFAGDDAPRAVFPSIVGRPRHQGVMVGMGQKDSYVGDEAQS
> > 60
> >
> > Query: 269 KRGILTLKYPIEHGIVTNWDDMEKIWHHTFYNELRVAPEEHPVLLTEAPLNPKANREKMT
> > 448
> > KRGILTLKYPIEHGIVTNWDDMEKIWHHTFYNELRVAPEEHPVLLTEAPLNPKANREKMT
> > Sbjct: 61 KRGILTLKYPIEHGIVTNWDDMEKIWHHTFYNELRVAPEEHPVLLTEAPLNPKANREKMT
> > 120
> >
> > Query: 449 QIMFETFNTPAMYVAIQAVLSLYASGRTTGIVMDSGDGVTHTVPIYEGYALPHAILRLDL
> > 628
> > QIMFETFNTPAMYVAIQAVLSLYASGRTTGIVMDSGDGVTHTVPIYEGYALPHAILRLDL
> > Sbjct: 121 QIMFETFNTPAMYVAIQAVLSLYASGRTTGIVMDSGDGVTHTVPIYEGYALPHAILRLDL
> > 180
> >
> > Query: 629 AGRDLTDYLMKILTERGYSFTTTAEREIVRDIKEKLCYVALDFEQEMATAASSSSLEKSY
> > 808
> > AGRDLTDYLMKILTERGYSFTTTAEREIVRDIKEKLCYVALDFEQEMATAASSSSLEKSY
> > Sbjct: 181 AGRDLTDYLMKILTERGYSFTTTAEREIVRDIKEKLCYVALDFEQEMATAASSSSLEKSY
> > 240
> >
> > Query: 809 ELPDGQVITIGNERFRCPEALFQPSFLGMESCGIHETTFNSIMKCDVDIRKDLYANTVLS
> > 988
> > ELPDGQVITIGNERFRCPEALFQPSFLGMESCGIHETT+NSIMKCDVDIRKDLYANTVLS
> > Sbjct: 241 ELPDGQVITIGNERFRCPEALFQPSFLGMESCGIHETTYNSIMKCDVDIRKDLYANTVLS
> > 300
> >
> > Query: 989 GGTTMYPGIADRMQKEITALAPSTMKIKIIAPPERKYSVWIGGSILASLSTFQQMWISKQ
> > 1168
> > GGTTMYPGIADRMQKEITALAPSTMKIKIIAPPERKYSVWIGGSILASLSTFQQMWISKQ
> > Sbjct: 301 GGTTMYPGIADRMQKEITALAPSTMKIKIIAPPERKYSVWIGGSILASLSTFQQMWISKQ
> > 360
> >
> > Query: 1169 EYDESGPSIVHRKCF 1213
> > EYDESGPSIVHRKCF
> > Sbjct: 361 EYDESGPSIVHRKCF 375
> >
> >
> >
> >
> >
> > Ying Lin
> > Dept. of Computer and Information Sciences
> > University of Delaware
> >
> > On Tue, 1 Jul 2003, Jason Stajich wrote:
> >
> > > Is the organism's name actually in the report?
> > >
> > > With SearchIO,
> > > $hit->name, $hit->description will give you whatever is in the report.
> > >
> > > -jason
> > > On Tue, 1 Jul 2003, ying lin wrote:
> > >
> > > > Hi,I have been trying to find the method to extract the organism's name of
> > > > hits in blast report but can't find it.Can anyone tell me what is
> > > > that?Thanks!
> > > >
> > > > Ying Lin
> > > > Dept. of Computer and Information Sciences
> > > > University of Delaware
> > > >
> > > > _______________________________________________
> > > > Bioperl-l mailing list
> > > > Bioperl-l at portal.open-bio.org
> > > > http://portal.open-bio.org/mailman/listinfo/bioperl-l
> > > >
> > >
> > > --
> > > Jason Stajich
> > > Duke University
> > > jason at cgt.mc.duke.edu
> > >
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at portal.open-bio.org
> > http://portal.open-bio.org/mailman/listinfo/bioperl-l
> >
>
> --
> Jason Stajich
> Duke University
> jason at cgt.mc.duke.edu
>
More information about the Bioperl-l
mailing list