[Bioperl-l] How to extract organism information from blast report?
Jason Stajich
jason at cgt.duhs.duke.edu
Tue Jul 1 13:38:53 EDT 2003
locus and accession are parsed out. We've tried not to assume that every
report is for a NCBI formatted database. I suppose we can add in an
organism field when we recognize the header line. (one could probably do
this too for Bio::Seq and species when we add a NCBIFasta parser at some
point).
At any rate I assume this regexp will get you what you want:
my ($org) = ($hit->description =~ /\[([^\]]+)\]/ );
if( $org ) {
# do something with the org name
}
-jason
On Tue, 1 Jul 2003, ying lin wrote:
> I thought that as methods to extract name and description as well as a
> bunch of other hit information have been provided, there should be some
> method to extract hit organism.
>
> Below is one example of hsp which has the [ORGANISM] information after
> name and description excerpted from blast report:
>
> >gi|3348131|gb|AAC27796.1| cytoplasmic beta actin [Xenopus laevis]
> gi|27735427|gb|AAH41203.1| similar to actin, beta, cytoplasmic [Xenopus
> laevis]
> Length = 375
>
> Score = 754 bits (1947), Expect = 0.0
> Identities = 371/375 (98%), Positives = 375/375 (100%)
> Frame = +2
>
> Query: 89 MEEEIAALVIDNGSGMCKAGFAGDDAPRAVFPSIVGRPRHQGVMVGMGQKDSYVGDEAQS
> 268
> ME++IAALV+DNGSGMCKAGFAGDDAPRAVFPSIVGRPRHQGVMVGMGQKDSYVGDEAQS
> Sbjct: 1 MEDDIAALVVDNGSGMCKAGFAGDDAPRAVFPSIVGRPRHQGVMVGMGQKDSYVGDEAQS
> 60
>
> Query: 269 KRGILTLKYPIEHGIVTNWDDMEKIWHHTFYNELRVAPEEHPVLLTEAPLNPKANREKMT
> 448
> KRGILTLKYPIEHGIVTNWDDMEKIWHHTFYNELRVAPEEHPVLLTEAPLNPKANREKMT
> Sbjct: 61 KRGILTLKYPIEHGIVTNWDDMEKIWHHTFYNELRVAPEEHPVLLTEAPLNPKANREKMT
> 120
>
> Query: 449 QIMFETFNTPAMYVAIQAVLSLYASGRTTGIVMDSGDGVTHTVPIYEGYALPHAILRLDL
> 628
> QIMFETFNTPAMYVAIQAVLSLYASGRTTGIVMDSGDGVTHTVPIYEGYALPHAILRLDL
> Sbjct: 121 QIMFETFNTPAMYVAIQAVLSLYASGRTTGIVMDSGDGVTHTVPIYEGYALPHAILRLDL
> 180
>
> Query: 629 AGRDLTDYLMKILTERGYSFTTTAEREIVRDIKEKLCYVALDFEQEMATAASSSSLEKSY
> 808
> AGRDLTDYLMKILTERGYSFTTTAEREIVRDIKEKLCYVALDFEQEMATAASSSSLEKSY
> Sbjct: 181 AGRDLTDYLMKILTERGYSFTTTAEREIVRDIKEKLCYVALDFEQEMATAASSSSLEKSY
> 240
>
> Query: 809 ELPDGQVITIGNERFRCPEALFQPSFLGMESCGIHETTFNSIMKCDVDIRKDLYANTVLS
> 988
> ELPDGQVITIGNERFRCPEALFQPSFLGMESCGIHETT+NSIMKCDVDIRKDLYANTVLS
> Sbjct: 241 ELPDGQVITIGNERFRCPEALFQPSFLGMESCGIHETTYNSIMKCDVDIRKDLYANTVLS
> 300
>
> Query: 989 GGTTMYPGIADRMQKEITALAPSTMKIKIIAPPERKYSVWIGGSILASLSTFQQMWISKQ
> 1168
> GGTTMYPGIADRMQKEITALAPSTMKIKIIAPPERKYSVWIGGSILASLSTFQQMWISKQ
> Sbjct: 301 GGTTMYPGIADRMQKEITALAPSTMKIKIIAPPERKYSVWIGGSILASLSTFQQMWISKQ
> 360
>
> Query: 1169 EYDESGPSIVHRKCF 1213
> EYDESGPSIVHRKCF
> Sbjct: 361 EYDESGPSIVHRKCF 375
>
>
>
>
>
> Ying Lin
> Dept. of Computer and Information Sciences
> University of Delaware
>
> On Tue, 1 Jul 2003, Jason Stajich wrote:
>
> > Is the organism's name actually in the report?
> >
> > With SearchIO,
> > $hit->name, $hit->description will give you whatever is in the report.
> >
> > -jason
> > On Tue, 1 Jul 2003, ying lin wrote:
> >
> > > Hi,I have been trying to find the method to extract the organism's name of
> > > hits in blast report but can't find it.Can anyone tell me what is
> > > that?Thanks!
> > >
> > > Ying Lin
> > > Dept. of Computer and Information Sciences
> > > University of Delaware
> > >
> > > _______________________________________________
> > > Bioperl-l mailing list
> > > Bioperl-l at portal.open-bio.org
> > > http://portal.open-bio.org/mailman/listinfo/bioperl-l
> > >
> >
> > --
> > Jason Stajich
> > Duke University
> > jason at cgt.mc.duke.edu
> >
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>
--
Jason Stajich
Duke University
jason at cgt.mc.duke.edu
More information about the Bioperl-l
mailing list