[Bioperl-l] Bio::SeqIO::game

Matthew Pocock mrp@sanger.ac.uk
Mon, 04 Dec 2000 15:02:22 +0000


Hi.

At the risk of getting flamed, are GAME files in general a serialized
form of a sequence database rather than a serialized form of a sequence
stream? Mabey we can get arround some of these issues by having a
sequence db builder using GAME. It doesn't get you over the issues with
memory usage, but at least the semantics become clear.

Matthew

Bradley Marshall wrote:

> --- Ewan Birney <birney@ebi.ac.uk> wrote:
> > On Fri, 1 Dec 2000, Bradley Marshall wrote:
> >
> > >
> > > How about this as a solution?
> > >
> > > We'll add a top level attribute and/or tag
> > describing
> > > whether or not the document is "chunkable".  Chris
> > > suggested we have a top level <flavor> element.
> > This
> > > can specify whether or not the document is
> > chunkable.
> > > A chunkable document would have this structure:
> > >
> >
> > ;).
> >
> > I think all useful documents will be chunkable.
>
> I agree that this is the case for large data transfer
> jobs like you're talking about.  A question we have is
> whether or not you're planning on transfering only
> genomic seqs w/ features or if you're doing mixed
> files - with genomic seqs' features forming mRNA and
> AA sequences.  It is this second case in which keeping
> things "chunkable" becomes difficult.
>
> But this flexibilty is also a major advantage of the
> GAME format. And even if a document is NOT chunkable,
> parsing performance is pretty gude for non-huge
> documents.  We still need to deal with the file-handle
> issue....
>
> Brad
>
> > I'd
> > claim that were just
> > letting ourselves into trouble if we allow badly
> > compacted XML to be
> > "valid"
> >
> > This solution is ok, but I would argue that it is
> > better to be strict
> > about these things otherwise implementations either
> > will have to throw
> > exceptions on non chunkable documents or have other
> > poorly defined
> > criteria....