[Biopython-dev] SeqRecord id behavior

Peter Cock p.j.a.cock at googlemail.com
Tue May 29 22:02:20 UTC 2012


On Tue, May 29, 2012 at 10:32 PM, Lenna Peterson <arklenna at gmail.com> wrote:
> Hi all,
>
> I have some questions/comments regarding how SeqRecord handles various
> arguments.
>
>>>> print SeqRecord(seq="G")
> ID: <unknown id>
> Name: <unknown name>
> Description: <unknown description>
> Number of features: 0
> 'G'
>>>> print SeqRecord(seq="G", id=2)
> TypeError: id argument should be a string
>>>> print SeqRecord(seq="G", id=None)
> Name: <unknown name>
> Description: <unknown description>
> Number of features: 0
> 'G'
>
> 1. Couldn't a sequence id hypothetically be an integer? In which
> case, it could be converted to a string.

We want to be able to assume a string for things like the
string formatting operators used in SeqRecord output
(dealing with None as a special case is annoying enough).

> 2. Regarding this comment on line 180:
> https://github.com/biopython/biopython/blob/master/Bio/SeqRecord.py#L180
>
>    if id is not None and not isinstance(id, basestring):
>        #Lots of existing code uses id=None... this may be a bad idea.
>        raise TypeError("id argument should be a string")
>
> Why might that be a bad idea? id=None will currently set self.id to
> None, so it doesn't affect the type checking.

Using None for the ID prevents code assuming it is a string
(but see below).

> 3. Is it desirable to be able to remove the id from the __str__
> representation,

No - the sequence and the ID are the two most important
bits of a SeqRecord.

> or would it be more consistent to do this:
>
>    if id == "<unknown id>" or id is None:
>        self.id = "<unknown id>"
>    else:
>        (typecheck here)
>
> Lenna

I never liked the face that "<unknown id>" has a space in it.
This breaks the assumption of loads of file formats. Many
file formats don't like an empty ID, so maybe "<unknown-id>"
is better. On the other hand, it is fairly common in Python
to use None as a missing data representation... which
currently the SeqRecord allows you to do.

Note these SeqRecord defaults predate Bio.SeqIO - if we
didn't have to worry about breaking existing code I would
much rather make the ID a mandatory SeqRecord argument.

Peter




More information about the Biopython-dev mailing list