[Bioperl-l] XML BLAST parsing & accessions

Jason Stajich jason@cgt.mc.duke.edu
Thu, 20 Jun 2002 15:35:49 -0400 (EDT)


On Thu, 20 Jun 2002, T.D. Houfek wrote:

> Aha... that's gotta be the problem then.  In my output,
> <BlastOutput_query-def> has apparently already performed some operation
> like (\S+)\s+(\S+), and taken only $2.  So with a header line like:
>
> >gnl|NCSU_FGL.blast|03E20.Contig1  M. grisea project xsal BAC03E20 Contig 1
>
> I get something like this:
>
> <BlastOutput_query-def>M. grisea project xsal BAC03E20 Contig 1</BlastOutput_query-def>
>
> And the other needed information is currently put in a
> <BlastOutput_query-ID> tag:
>
> <BlastOutput_query-ID>gnl|NCSU_FGL.blast|03E20.Contig1</BlastOutput_query-ID>
>

How annoying, the query-ID used to be filled with lcl|QUERY or a BLAST
queue ID if you were submitting a job on their servers.  We can do some
simple logic to detect this though and solve the problem since we should
be getting the right things into the right slotsj.  I'll have a
look or you can dig in the SearchIO::blastxml.pm code if you're feeling
adventuresome....

> I went to check what version I have and can't for the life of me figure
> out where the distribution hides the information (no -v or -V stuff seems
> to work... they tell you the info is in a file that isn't there, etc).
> But it is a very recent version; a few months ago they made changes to the
> format of their databases, and this version postdates that change.
>
It actually should be reported in the xml output file - something like
<BlastOutput_version> and/or <BlastOutput_program>


>
> T.D. Houfek
>
> system administrator
> Fungal Genomics Laboratory
> Center for Integrated Fungal Research (CIFR)
> North Carolina State University
> ph: (919)513-0025  e: tdhoufek@unity.ncsu.edu
>
>

-- 
Jason Stajich
Duke University
jason at cgt.mc.duke.edu