[Bioperl-l] XML BLAST parsing & accessions

T.D. Houfek tdhoufek@unity.ncsu.edu
Thu, 20 Jun 2002 11:40:06 -0400 (EDT)


Hi Jason,

> TD - good to see you on list -

Thanks!  It's good to be here.  :-)

> this is entirely dependent on what BLAST
> does, i.e. I implemented it so it just pull what is in
> <BlastOutput_query-def> </> into query_name and then it takes the first
> white space delimited section (i.e.) /(\S+)\s+(\S+)/ -- and makes that the
> name, and the second one is the description.  It tries to guess the
> accession as well based on the last '|'

Aha... that's gotta be the problem then.  In my output,
<BlastOutput_query-def> has apparently already performed some operation
like (\S+)\s+(\S+), and taken only $2.  So with a header line like:

>gnl|NCSU_FGL.blast|03E20.Contig1  M. grisea project xsal BAC03E20 Contig 1

I get something like this:

<BlastOutput_query-def>M. grisea project xsal BAC03E20 Contig 1</BlastOutput_query-def>

And the other needed information is currently put in a
<BlastOutput_query-ID> tag:

<BlastOutput_query-ID>gnl|NCSU_FGL.blast|03E20.Contig1</BlastOutput_query-ID>

I went to check what version I have and can't for the life of me figure
out where the distribution hides the information (no -v or -V stuff seems
to work... they tell you the info is in a file that isn't there, etc).
But it is a very recent version; a few months ago they made changes to the
format of their databases, and this version postdates that change.


T.D. Houfek

system administrator
Fungal Genomics Laboratory
Center for Integrated Fungal Research (CIFR)
North Carolina State University
ph: (919)513-0025  e: tdhoufek@unity.ncsu.edu