New Bio::Seq and Bio::Seq::Parse (.025 BETA)
Steven E. Brenner
brenner@akamail.com
Fri, 21 Mar 1997 15:30:08 +0900 (JST)
I'm only online for about an hour before I disappear again for over a
week. Here are some very quick thoughts.
On Tue, 18 Mar 1997, Georg Fuellen wrote:
> Steve wrote,
> I would carp and then the user knows that warnings and even fatal errors are
> possible later on.
Sounds reasonable. I can go with this.
> I strongly believe we should support ids with ``\S''; I've seen them in
> Fasta-files, Nexus files, etc, etc.
I would prefer we document it as restricted to \w, but not enforce that.
We might want to plan to allow \S eventually.
> > > > whitespace is a legal component of most filesystems. (It is on Unix,
> > > > Macintosh, and Windows, for example).
> > >
> > > Space (`` '') may be OK, but newline (``\n'') certainly not ?!
> >
> > On unix, at least, \n certainly is valid. Typically the only illegal
> > characters are "\0" and "/". Some filesystems even allow those.
>
> I guess I failed to say that I'm talking about filenames. In a lot of
> cases, the filename will give rise to the (default) id and vice versa.
Yes. *filenames* on Macintosh, Unix, and Windows allow spaces and on Unix
many all other \s characters are also allowed.
> > > Hm. What about ids that we inherit from somewhere ? E.g. from a file ?
> > > On a parallel machine, this won't work either I think. What about other
> > > distributed computation; CORBA may offer solutions, but it's another
> > > big can of worms although I feel that we'll have to open it at some time -
> > > does anyone know more about CORBA ? (I've just heard rumors! :)
> >
> > Why would you inherit ids? These ids are ONLY for setting names of
> > bio::seq's. I don't see how parallel programs and/or CORBA have anything
> > to do with it. We only need to guarantee that id's are unique within a
> > given program.
>
> For me (and in the current code, e.g. parse_fasta), the ids are the identifiers
> you find in files, etc, etc. It seems that you're introducing a new notion of
> id, the merit of which is rather unclear to me.
I think we're talking at cross-purposes. The original question was about
what to do when the user fails to give any id whatsoever. These uniq_id's
generated by the code are only in the case where the user has failed to
provide an id. Currently, it just sets it to "_no_id_given" or something
to that effect. I suggested that this be set to simething like
"_no_id_xxxxxx" where xxxxx is some number unique to the program, so
that the different sequences have different names.