[Biopython-dev] Parsing PAML supplementary output

Peter Cock p.j.a.cock at googlemail.com
Tue Oct 11 08:20:52 UTC 2011


On Tue, Oct 11, 2011 at 8:51 AM, Brandon Invergo <b.invergo at gmail.com> wrote:
>> If you can extend the current PHYLIP parser (strict or relaxed)
>> to cover interleaved and sequential, that would be nice. For
>> strict mode at least, we can in principle follow whatever the
>> original PHYLIP tools do to detect this automatically. It may
>> be safer to make it explicit though - from what I recall without
>> seeing the PHYLIP implementation's source code it was not
>> obvious how to do this reliably.
>>
> I checked out the PHYLIP code and yes it's not really obvious how the
> mode is detected. In fact, it seems that many of the programs ask for
> user input to specify the format of the alignment.
>
> So, regarding making it explicit, I'm not sure if this is what you meant
> but I was thinking it might be simplest to add another Iterator/Writer
> pair in the PhylipIO module for SequentialPhylip which inherit from the
> basic Phylip classes, overriding the next() method in the iterator and
> the write_alignment() method in the writer, much in the way that the
> RelaxedPhylip classes work.

Something like that as a new format variant, yes.

> This would mean that there would be no flexibility in the naming rules
> (ie relaxed vs strict) for the SequentialPhylip format, unless I were to
> also make a RelaxedSequentialPhylip pair of classes. PAML relaxes the
> sequence name length restriction to 30 characters and since the whole
> reason for embarking on this exercise was to support PAML's output of
> PHYLIP alignments, if only one naming convention is to be implemented I
> think it would be best to default to the relaxed rules.

Practical.

> Slightly unrelated musings: I was thinking that with Biopython's support
> for reading PHYLIP alignments and Newick trees into objects, at some
> point it would probably be convenient to make the Bio.Phylo.PAML package
> more integrated by allowing the user to pass in such objects as
> arguments rather than writing them to files first; the PAML module could
> write them to temp files itself. I think some minor changes might have
> to be made in places (ie for PAML to accept interleaved alignments, the
> header line must contain an 'I' flag after the seq # and seq len
> integers) and I'd have to think about how best to allow passing such
> objects while still retaining the ability to specify filenames without
> using kludgy, non-pythonic type-checking. Anyway, another task for
> another day, but I thought I'd throw it out there.

Do we need to write the "I" flag in our PHYLIP output?

Peter



More information about the Biopython-dev mailing list