[Bioperl-l] Phylip format error
Fields, Christopher J
cjfields at illinois.edu
Thu May 23 14:05:32 UTC 2013
On May 23, 2013, at 3:30 AM, Peter Cock <p.j.a.cock at googlemail.com> wrote:
> On Thu, May 23, 2013 at 8:22 AM, Alexey Morozov
> <alexeymorozov1991 at gmail.com> wrote:
>> Which is also worsened by the fact that there is relaxed phylip format,
>> which allows up to 250 chars for taxon name. They are separated from a
>> sequence by single space, which creates problems if names were extended to
>> 10 chars in strict Felsenstein's format by whitespaces. On the whole,
>> phylip is as messily defined format as one can make from a plain textfile
>> with information content of fasta.
>> Bioperl documentation says nothing about whether Bio::SeqIO accepts relaxed
>> phylip and how does it tell dialects from one another. Even if code support
>> is OK, it may be worthwile to explain it somewhere at bioperl.org
>
> Biopython's AlignIO defines both a (strict) "phylip" and "relaxed-phylip"
> as two separate formats (or variants, like the "fastq" variants). Doing
> the same in BioPerl would seem sensible since auto-detection is not
> easy.
>
> http://biopython.org/wiki/AlignIO#File_Formats
>
> Peter
>
> P.S. Where does that 250 characters for the taxon name limit come from?
> The trouble with relaxed phylip is that some tools are more relaxed than
> others ;)
As Adam pointed out, prior to the introduction of 'relaxed phylip' we had an alternative solution that didn't require a modified format but still allowed one to use PHYLIP and other tools requesting the format. I think 'relaxed phylip' was introduced by CIPRES a few years back. Frankly, this is the first time I have seen this mentioned on the list; yay, yet another format variation :)
The variant format parsing (as implemented for SeqIO::fastq, as you know) deals with variant names like 'fastq-sanger', where the main format name is first, the variant of the format second. The order in this case is reversed (relaxed-phylip), which I'm pretty sure will not work. Not impossible to allow, but we would probably allow support like this initially:
my $in = Bio::AlignIO->new(-format => 'phylip',
-variant => 'relaxed',
…);
chris
More information about the Bioperl-l
mailing list