[Biopython-dev] Arlequin sequence files in Bio.Popgen

Tiago Antão tiagoantao at gmail.com
Tue Jul 7 07:55:58 UTC 2009

Hi David,

[I am Ccing the biopython-dev mailing list, so that other biopython
dev people can chip in]

2009/7/7 David WInter <winda002 at student.otago.ac.nz>:
> Is there any plan to support arlequin in Bio.Popgen? The script that I have

Bio.PopGen currently supports Simcoal, so it should already support
Arlequin (as Simcoal outputs arlequin). Unfortunatelly I never got
round to make an Arlequin parser (which makes full sense, for a lot of

> to have a go at getting it to work in that framework.

That would be more than welcome. I have personally an interest on
getting it up and running. Arlequin format support is an important
thing. If you have little time, I can offer to help.

If you prefer to go ahead alone you are also more than welcome to do
it. Just dont do the same mistake that I did with the genepop parser:
where I load the whole file into memory. I have discovered that there
are a lot of people that have thousands of markers and thousands of
individuals (loading such a file into memory is in some cases
impossible). Using an iterator might be a solution.
One might try to go to the Arlequin developers and ask for a
specification of the format (as far as I know there is no
specification in public).

Code on biopython has to have documentation and unit tests (a boring
thing, but necessary). In this case, I would not mind doing that
myself (in case you are uninterested) as I think Arlequin support is
really a cool thing.

I will sort out the git links, thanks for the info.

BTW if you are doing any kind of frequency based statistics, we are
adding support for genepop statistics (mainly a python wrapper to the
application). You can now get things like Fst, Fis and the likes from
inside python.

Feel free to write back with any comments you might have.


