[Biopython-dev] [Wg-phyloinformatics] GSoC Weekly Update: PhyloXML for Biopython

Peter biopython at maubp.freeserve.co.uk
Thu Jun 18 09:35:24 UTC 2009


On Thu, Jun 18, 2009 at 12:17 AM, Eric Talevich<eric.talevich at gmail.com> wrote:
> Hi Brad,
>
> Here's a mid-week update and partial response to your questions.
>
> *SeqRecord transformation*
>
> It would be nice if I could round-trip this sequence information perfectly,
> so that nothing's lost between reading and writing an arbitrary, valid
> PhyloXML file. For that to work, PhyloXML.Sequence.from_seqrec()
> would need to look at SeqRecord.features and assume that any matching
> keys have the appropriate PhyloXML meaning.
>
> These are the keys that from_seqrec() would look for:
>    location
>    uri
>    annotations
>    domain_architecture
>
> Do you see any risk of collision for those names? And for serialization,
> would it be unwholesome to convert Annotation and DomainArchitecture objects
> to a GFF-style dict-in-a-string? e.g. annotation="ref=foo;source=bar;..." --
> it's another layer of parsing and kind of esoteric, but I can live with it.

If you can show us a sample record, I would be better able to comment
on how I would store it in a SeqRecord.

Are you fully familiar with the SeqRecord object, its annotations dictionary,
and the list of SeqFeature objects (which have locations relative to the
parent SeqRecord) which all have their own annotations dictionary
(although under the name of qualifiers for some reason). Perhaps you'd
like to proof read the new SeqRecord chapter in the tutorial - it is still a
work in progress, but should be informative.

Peter




More information about the Biopython-dev mailing list