[Biopython-dev] [Bug 2643] Proposal: fastPhaseOutputIO for SeqIO

bugzilla-daemon at portal.open-bio.org bugzilla-daemon at portal.open-bio.org
Thu Nov 6 18:34:03 UTC 2008


http://bugzilla.open-bio.org/show_bug.cgi?id=2643





------- Comment #11 from biopython-bugzilla at maubp.freeserve.co.uk  2008-11-06 13:34 EST -------
Replying to Marco's email on the dev mailing list:

>> Are there any other tools that output this file format?  Do you think the
>> author might be willing to just add an option to output the sequences in
>> another format (e.g. FASTA, or better an alignment format designed for more
>> than one alignment).  This would be a neater solution in the long run (and
>> would benefit anyone using fastPhase - not just Biopython).
>
> Not for my knowledge.
> Anyway, consider that a fastPhase run could take days for medium/big samples.
> In some situations it could be faster to convert its output to fasta
> (or other ones) directly, instead of re-calculating the results.

OK - I had not appreciated the run time involved.  Clearly it would not be
sensible to have to repeat a long analysis just to get the results in another
format (e.g. as FASTA, or the simplified -Z output whatever that looks like).

>> If it is for DNA only, the sequences/alignments returned should ideally
>> specify a DNA alphabet.
>
> mmm ok...
> Basically it could be used also with characters like genes and other
> markers.. but in that case, it would not make sense to parse it as a
> sequence, so nobody would try to do it.

That's interesting, and means assuming DNA wouldn't be safe.  Just use the
single letter alphabet then (rather than defaulting to the completely generic
base alphabet).

>>> Because that would mean that one individual has only a chromosome.
>>> It doesn't make sense to run fastPhase on an haploid individual.
>>
>> Is fastPhase only for haploids?  Could it be used with polyploidy (e.g.
>> plants)?
>
> I think not... It would be another class of problem.
> What fastPhase does, is trying to infer haplotypes from genotype data.

OK - you can probably tell I'm not a population biologist from the questions ;)

>> I was actually thinking the -Z format might be much simpler to deal
>> with (I didn't mean to suggest supporting both).  On the other hand,
>> the documentation does say the -Z is "not intended for general use".
>
> The problem is that it could take days to run a fastPhase... most of
> the times you want the longer format, and then proceed to parse it.
> Anyway, it should not be a big problem to implement it

OK (as I wrote above), I can see now that using the simplified -Z output is not
sensible.

> (I am just putting all of that information in SeqRecord.description)

If we know the meaning of some of these fields, then ideally they should go in
the annotations dictionary, rather than just in the SeqRecord description.

Peter


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.



More information about the Biopython-dev mailing list