[Biopython-dev] Simpler SeqRecord creation

Sun May 16 11:48:49 UTC 2010

On Sat, May 15, 2010 at 6:53 PM, Brad Chapman <chapmanb at 50mail.com> wrote:
> Peter and Michiel;
>
>> > If the SeqRecord __init__ checked for a plain string as
>> > the sequence, it could automatically upgrade it into a
>> > Seq object with the default argument, thus:
>> >
>> > from Bio.SeqRecord import SeqRecord
>> > rec = SeqRecord("ACGT", id="Test")
>
>> Simpler SeqRecord creation is good in itself, but I wouldn't spend too
>> much time on int. If hopefully we some day deprecate alphabets, then a
>> Seq object reduces to a string anyway.
>
> Accepting strings seems like a good way to start a transition from
> Seq objects to standard strings. +1 for this.

I'm not convinced about moving from Seq objects to plain strings.
I *like* having the biological methods as part of the Seq object.

I can also think of several useful Seq subclass objects such as
2bit encoded unambiguous DNA or RNA (BioJava has this) or
4bit encoded ambiguous DNA or RNA (the BAM format uses this).
These would be a trade off using less memory at the expense of
being a bit slower for many operations - they could be very useful
is dealing with next generation sequence data.

> It would also be useful if the defaults for id, name and description
> were empty strings instead of "<unknown whatever>." These don't seem
> especially useful, and when generating SeqRecords and writing them
> to Fasta, this helps avoid having to explicitly set descriptions to
> an empty string.

Yes, I like that idea for name and description. I'm not 100% sure
about having a default ID - I'd prefer that was mandatory since so
much depends on it (e.g. SeqIO and AlignIO), and a default of the
empty string may have side effects. Changing these defaults won't
hurt performance which is good. Something to change after we
release Biopython 1.54 this coming week?

Peter