[Biopython-dev] MAF Parser/Writer/Indexer

Peter Cock p.j.a.cock at googlemail.com
Mon May 16 17:58:23 UTC 2011


On Mon, May 16, 2011 at 6:26 PM, Andrew Sczesnak
<andrew.sczesnak at med.nyu.edu> wrote:
> On 05/16/2011 07:14 AM, Peter Cock wrote:
>>
>> Do you think we should follow the speciesOrder directive if
>> present?
>
> Yeah, why not.  I started working on this and the problem was, as defined in
> the spec, the species is just "hg19" or "mm9," yet the records are in
> species.chromosome format.  Should we enforce that the species in a
> speciesOrder directive must exactly match a sequence identifier, or add a
> split and do some checks to make sure a record matches only one species in
> speciesOrder?

That is a subtlety I missed - maybe it is simpler to ignore speciesOrder
after all. I presume it is intended a graphical output directive really.

>> Also I think we may need to do something rigorous with start/end
>> co-ordinates and strand in either the Seq or SeqRecord object.
>> They could be updated automatically during slicing and taking
>> reverse complement... they might not survive addition though.
>
> This is interesting.  I wonder if it makes sense to preserve this
> information if a SeqRecord is going to be maniuplated outside a
> MultipleSeqAlignment object.  Could this be accomplished by
> migrating the annotation information to a SeqFeature?

I'm not sure how using a SeqFeature would work here.

Also consider that someone might manipulate the alignment
directly, e.g. alignment[:,10:60] to pull out fifty columns. That
seems like a use case where the start/end co-ordinates should
be updated nicely. Note that internally this calls record[10:60]
for each row of the alignment, so using SeqRecord objects.

Peter




More information about the Biopython-dev mailing list