[Bioperl-l] Re: [Open-bio-l] seq namespace method

Matthew Pocock matthew_pocock@yahoo.co.uk
Tue, 09 Jul 2002 13:08:06 +0100


Hi Hilmar,

Glad to see that someone is looking over the BioSql stuff again. Just 
some random thoughts.

Can a sequence belong to more than one namespace? It depends what you 
want the namespace to mean. For example, if you had five sequence 
databases, one shadowed from persistant stoorage (indexed flat, sql, 
corba, whatever) and four in-memory databases all with names (genbank, 
my interesting sequences, blast hits... ) then a single sequence object 
could be in all of them. If the namespace is meant to represent the name 
of the collection it is part of, this becomes ambiguous. The sql has 
this where it is because you need somewhere for the 'sequence is part of 
database' relation. In sql, this goes with the sequence. In oopy 
collections, this goes in the database. If the namespace is some 
meta-data about the publisher, then the rich sequence could have a slot 
for this in the interface or as a well known type of annotation (which 
may be the same object as a database uses to publish its meta data). Do 
you want to be able to use some sequence ID in conjunction with the 
namespace to re-fetch the sequence at a later time? If so, how much 
information would you need to store, and how much is discovered at 
sequence-resolution-time? Are namespaces independant (or potentialy 
independant) of where the sequence was fetched from? How does this 
relate to the bio-directorys stuff?

Bootom line:
   what does namespace mean?
   is this best represented at the level of the sequence or the sequence 
collection?
   are you re-inventing URNs / naming and directory / name resolvers?

Matthew

Hilmar Lapp wrote:
> According to BioSQL, sequences (bioentries) live in a namespace, e.g., the name of the databank that maintains and/or serves them.
> 
> None of the Bio:: seq objects/interfaces have a method for that.
> 
> I propose to add one, specifically to the lowest level Bio::PrimarySeqI (bioentries are pretty general, and a namespace is needed for /any and all/ bioentries). To me, the namespace doesn't have to do much with whether this seq is going to be stored in BioSQL or not. A sequence with an accession number has (implicitly or explicitly) a namespace in which this accession number is valid. PrimarySeqI has an accession.
> 
> Anyone has other suggestions, objections?
> 
> 	-hilmar