[Bioperl-l] Getting errors parsing TIGR XML in SeqIO
Jonathan.Moore at warwick.ac.uk
Wed Jul 1 10:04:24 UTC 2009
Thanks for the suggestion Jason.
There is a bit of a gulf between the tigrxml test file and the TAIR9 Arabidopsis release in TIGR XML format. BP's tigrxml test file's top-level object is ASSEMBLY, whereas in the TAIR file ASSEMBLY is already two levels deep in the object hierarchy inside TIGR and PSEUDOCHROMOSOME. In addition, the two main objects within the TAIR ASSEMBLY object, GENE_LIST and ASSEMBLY_SEQUENCE, don't get a mention in our test file. Looks like a bit of work would be needed to map this.
>There are several flavors of TIGR XML for rice and arabidoposis, and
>other projects etc, I don't know which is tracked with the current
>tigrxml version unfortunately but one can compare the test files in t/
>data to the versions downloaded to see what is currently supported.
>Usually the gbk will be more consistently parseable but we can try and
>work it out if it is a sensible transformation.
>> I'm trying to parse the TAIR9 Arabidopsis release from the TIGR XML
>> files at the TAIR FTP site.
>> I've tried SeqIO with both tigr and tigrxml formats but both are
>> giving errors in 1.6.0. Has anyone advice on whether it's likely to
>> be doable, or should I wait til the .gb files are available?
>> Jay Moore
>jason at bioperl.org
More information about the Bioperl-l