New Bio::Seq and Bio::Seq::Parse (.025 BETA)
Georg Fuellen
fuellen@dali.Mathematik.Uni-Bielefeld.DE
Tue, 18 Mar 1997 19:38:01 +0000 (GMT)
Steve wrote,
about ``always have valid objects''
> I'm not sure this is always possible. When it is possible, it may require
> doing non-intuitive things. What if someone initializes a bio::seq to be
> DNA but puts in illegal bases?
I would carp and then the user knows that warnings and even fatal errors are
possible later on.
> > > > The problem w/ comma-separated is that according to our current
> > > > specs, comma is a legal component of an ID; we only carp on whitespace.
> > > > In other words, ``Mus,musculus'' is a legal ID.
> > > > Since non-whitespace is also a legal component of filenames on many systems
> > > > I believe, I'd like to keep the convention.
> > >
> > > I thought ID's had to be in '\s'; if not, maybe they should be. Further,
> >
> > Do you mean ``\S'', i.e. everything but space and ``\t\n\r\f'' ??
>
> Ooops. Meant \w. Sorry!!!
I strongly believe we should support ids with ``\S''; I've seen them in
Fasta-files, Nexus files, etc, etc.
> > > whitespace is a legal component of most filesystems. (It is on Unix,
> > > Macintosh, and Windows, for example).
> >
> > Space (`` '') may be OK, but newline (``\n'') certainly not ?!
>
> On unix, at least, \n certainly is valid. Typically the only illegal
> characters are "\0" and "/". Some filesystems even allow those.
I guess I failed to say that I'm talking about filenames. In a lot of
cases, the filename will give rise to the (default) id and vice versa.
> We discussed this before iwth Fasta/FastA/FASTA/fasta. Changing all of
> these to lower case follows the same rationale (DNA/dna/Dna?)
> OtherSeq/Otherseq/otherseq. Everything should be kept consistent, and
> lower case is easy for htis
Ok, ok, will use dna,rna,protein in the future...
> > Hm. What about ids that we inherit from somewhere ? E.g. from a file ?
> > On a parallel machine, this won't work either I think. What about other
> > distributed computation; CORBA may offer solutions, but it's another
> > big can of worms although I feel that we'll have to open it at some time -
> > does anyone know more about CORBA ? (I've just heard rumors! :)
>
> Why would you inherit ids? These ids are ONLY for setting names of
> bio::seq's. I don't see how parallel programs and/or CORBA have anything
> to do with it. We only need to guarantee that id's are unique within a
> given program.
For me (and in the current code, e.g. parse_fasta), the ids are the identifiers
you find in files, etc, etc. It seems that you're introducing a new notion of
id, the merit of which is rather unclear to me.
best wishes,
georg