[Bioperl-l] Re: [Open-bio-l] seq namespace method
Matthew Pocock
matthew_pocock@yahoo.co.uk
Tue, 09 Jul 2002 13:08:06 +0100
Hi Hilmar,
Glad to see that someone is looking over the BioSql stuff again. Just
some random thoughts.
Can a sequence belong to more than one namespace? It depends what you
want the namespace to mean. For example, if you had five sequence
databases, one shadowed from persistant stoorage (indexed flat, sql,
corba, whatever) and four in-memory databases all with names (genbank,
my interesting sequences, blast hits... ) then a single sequence object
could be in all of them. If the namespace is meant to represent the name
of the collection it is part of, this becomes ambiguous. The sql has
this where it is because you need somewhere for the 'sequence is part of
database' relation. In sql, this goes with the sequence. In oopy
collections, this goes in the database. If the namespace is some
meta-data about the publisher, then the rich sequence could have a slot
for this in the interface or as a well known type of annotation (which
may be the same object as a database uses to publish its meta data). Do
you want to be able to use some sequence ID in conjunction with the
namespace to re-fetch the sequence at a later time? If so, how much
information would you need to store, and how much is discovered at
sequence-resolution-time? Are namespaces independant (or potentialy
independant) of where the sequence was fetched from? How does this
relate to the bio-directorys stuff?
Bootom line:
what does namespace mean?
is this best represented at the level of the sequence or the sequence
collection?
are you re-inventing URNs / naming and directory / name resolvers?
Matthew
Hilmar Lapp wrote:
> According to BioSQL, sequences (bioentries) live in a namespace, e.g., the name of the databank that maintains and/or serves them.
>
> None of the Bio:: seq objects/interfaces have a method for that.
>
> I propose to add one, specifically to the lowest level Bio::PrimarySeqI (bioentries are pretty general, and a namespace is needed for /any and all/ bioentries). To me, the namespace doesn't have to do much with whether this seq is going to be stored in BioSQL or not. A sequence with an accession number has (implicitly or explicitly) a namespace in which this accession number is valid. PrimarySeqI has an accession.
>
> Anyone has other suggestions, objections?
>
> -hilmar