[Bioperl-l] Question - Primary_id

Mon, 23 Dec 2002 13:29:12 +0000 (GMT)

In Bioperl I afflicted one of my annoying views of the world that there
should be 3 ids associated with a sequence:

  display_id => what you show to the user
  accession_number => unique id for the biological object
  primary_id => unique id for an implementation

Out of these three, both display_id and accession_number have pretty
consistent semantics and usage, whereas primary_id frankly should probably
be junked - it is too complex an idea to be enforced, and by definition is
implementation specific. In addition the words "primary_id" is way too
loaded, for example, "implementation_specific_id" would have been a much
better name

My apologies. A number of people - Hilmar/Lincoln and others have sort of
questioned this slightly in the past and then dropped it.

My view for the code is to -

   for 1.2:

   - put a "this is going to be deprecated" note in the documentation and
stress that this is not going to be

   for 1.3/4 series -

   - remove primary_id completely

Do people think this is silly or not?

Now- question is what do we do for the fasta parser? Fasta files have
officially only one id and ... then a completely divergent way of handling
other ids associated with it (the most common of which is hte NCBI |
symbol system).

Currently in the fasta parser we have:

   $seq = $self->sequence_factory->create(
                                           -seq         => $sequence,
                                           -id          => $id,
                                           # Ewan's note - I don't think this healthy
                                           # but obviously to taste.
                                           #-primary_id  => $id,
                                           -desc        => $fulldesc,
                                           -alphabet    => $alphabet,
                                           -direct      => 1,
                                           );

Any thoughts about this?