[Biopython-dev] Arlequin sequence files in Bio.Popgen

Tue Jul 7 07:55:58 UTC 2009

Hi David,

[I am Ccing the biopython-dev mailing list, so that other biopython
dev people can chip in]

2009/7/7 David WInter <winda002 at student.otago.ac.nz>:
> Is there any plan to support arlequin in Bio.Popgen? The script that I have

Bio.PopGen currently supports Simcoal, so it should already support
Arlequin (as Simcoal outputs arlequin). Unfortunatelly I never got
round to make an Arlequin parser (which makes full sense, for a lot of
reasons).

> to have a go at getting it to work in that framework.

That would be more than welcome. I have personally an interest on
getting it up and running. Arlequin format support is an important
thing. If you have little time, I can offer to help.

If you prefer to go ahead alone you are also more than welcome to do
it. Just dont do the same mistake that I did with the genepop parser:
where I load the whole file into memory. I have discovered that there
are a lot of people that have thousands of markers and thousands of
individuals (loading such a file into memory is in some cases
impossible). Using an iterator might be a solution.
One might try to go to the Arlequin developers and ask for a
specification of the format (as far as I know there is no
specification in public).

Code on biopython has to have documentation and unit tests (a boring
thing, but necessary). In this case, I would not mind doing that
myself (in case you are uninterested) as I think Arlequin support is
really a cool thing.

I will sort out the git links, thanks for the info.

BTW if you are doing any kind of frequency based statistics, we are
adding support for genepop statistics (mainly a python wrapper to the
application). You can now get things like Fst, Fis and the likes from
inside python.

Feel free to write back with any comments you might have.

Tiago