[Bioperl-l] Question - Primary_id
Ewan Birney
birney@ebi.ac.uk
Mon, 23 Dec 2002 13:29:12 +0000 (GMT)
In Bioperl I afflicted one of my annoying views of the world that there
should be 3 ids associated with a sequence:
display_id => what you show to the user
accession_number => unique id for the biological object
primary_id => unique id for an implementation
Out of these three, both display_id and accession_number have pretty
consistent semantics and usage, whereas primary_id frankly should probably
be junked - it is too complex an idea to be enforced, and by definition is
implementation specific. In addition the words "primary_id" is way too
loaded, for example, "implementation_specific_id" would have been a much
better name
My apologies. A number of people - Hilmar/Lincoln and others have sort of
questioned this slightly in the past and then dropped it.
My view for the code is to -
for 1.2:
- put a "this is going to be deprecated" note in the documentation and
stress that this is not going to be
for 1.3/4 series -
- remove primary_id completely
Do people think this is silly or not?
Now- question is what do we do for the fasta parser? Fasta files have
officially only one id and ... then a completely divergent way of handling
other ids associated with it (the most common of which is hte NCBI |
symbol system).
Currently in the fasta parser we have:
$seq = $self->sequence_factory->create(
-seq => $sequence,
-id => $id,
# Ewan's note - I don't think this healthy
# but obviously to taste.
#-primary_id => $id,
-desc => $fulldesc,
-alphabet => $alphabet,
-direct => 1,
);
Any thoughts about this?