[Bioperl-l] XML BLAST parsing & accessions
Jason Stajich
jason@cgt.mc.duke.edu
Wed, 19 Jun 2002 21:18:46 -0400 (EDT)
TD - good to see you on list - this is entirely dependent on what BLAST
does, i.e. I implemented it so it just pull what is in
<BlastOutput_query-def> </> into query_name and then it takes the first
white space delimited section (i.e.) /(\S+)\s+(\S+)/ -- and makes that the
name, and the second one is the description. It tries to guess the
accession as well based on the last '|'
so- first off, what version of NCBI blast, can you cut and paste that top
part of the XML which has the BlastOutput... yadda - into an email. I
think either the tag names have changed (grr) or something else is
happening with the expected input.
-jason
On Wed, 19 Jun 2002, T.D. Houfek wrote:
> A few days ago I decided to re-write a portion of a batch BLASTing system
> I'm working on so that it performs its (XML) report parsing using
> BioPerl(1.0) instead of my own home-grown parser. Specifically (in case
> there's a whole other way of going about this), I am creating a
> Bio::SearchIO object from a filehandle to an XML report:
>
> my $searchio = new Bio::SearchIO(-tempfile => 1,
> -format => 'blastxml',
> -fh => $blastReport);
>
> then $searchio->next_result() to get a Result object,
> whose ->next_hit() method coughs up Hit objects, which in turn cough up
> hsp objects with ->next_hsp().
>
> And it all is working beautifully, I must say. The only problem I have
> noticed, and it is kind of a problem, is that neither the Result object's
> ->query_name nor its ->query_accession method are returning anything for
> me. I'm working with FASTA headers that look like this:
>
> >gnl|NCSU_FGL.blast|03E20.Contig1 M. grisea project xsal BAC03E20 Contig 1
>
> and I'm trying to get out of the corresponding BLAST report the bit the
> first part of the header, i.e.
>
> gnl|NCSU_FGL.blast|03E20.Contig1
>
> I would have expected either ->query_name or ->query_accession to return
> this. Have I violated a Bioperl expectation about header information
> format? (This format doesn't prevent the information from appearing in
> the XML reports themselves).
>
> I appreciate any help you can give me,
> TD
>
>
> T.D. Houfek
>
> system administrator
> Fungal Genomics Laboratory
> Center for Integrated Fungal Research (CIFR)
> North Carolina State University
> ph: (919)513-0025 e: tdhoufek@unity.ncsu.edu
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@bioperl.org
> http://bioperl.org/mailman/listinfo/bioperl-l
>
--
Jason Stajich
Duke University
jason at cgt.mc.duke.edu