[Bioperl-l] Re: [SO-devel] GFF3 preliminary
Ewan Birney
birney at ebi.ac.uk
Wed Feb 19 06:42:18 EST 2003
On Tue, 18 Feb 2003, Mark Yandell wrote:
> Hi All,
>
>
> ". When asked why they
> > have modified the published Sanger specification, bioinformaticists
> > frequently answer that the format was insufficient for their needs...",
>
>
> So why not just use XML? you know, with like a real DTD, like the rest of the
> world and be done with it ?
>
that's what NCBI Seq XML or GAME XML or (new and shiny...talk to Michele)
Otter XML is for, and they solve specific problems.
With XML you can't:
use grep
use sort and sort -k and other twisted options of sort
use comm
use awk
With XML you need
a decent XML SAX parser in your language of choice to read it reliably -
now this is pretty much there for most languages
enough coding time to write a SAX event to internal data structure
in a tag-tolerant way (after all, if you are going to be strict on the
tags and not tolerate additional tags... then why use XML?). Nowhere near
impossible, but nowhere near as simple as @fields = split;
endless discussions with people who are trying to solve related but
distinct problems to discover that you want to write separate XML formats.
XML is a bad format, but undoubtly the best format out there for complex
data.
XML simply doesn't replace tab delimited formats and we shouldn't mandate
the death of GFF and friends (eg, GTF) due to XML formats being used for
complex data transfer.
More information about the Bioperl-l
mailing list