[Biopython-dev] Tutorial

Mon Dec 14 11:06:14 UTC 2015

On Sun, Dec 13, 2015 at 8:17 PM, Tiago Antao <tra at popgen.net> wrote:
> Hi,
>
> On 2015-12-13 06:03, Peter Cock wrote:
>>
>> If you spotted any untested example without dependencies,
>> please try adding the doctest comment markup to see if we
>> can expand the test_Tutorial.py coverage.
>
>
> As I go through the problems with the notebook, I can report the issues on
> github, but I am diverting all the time that I have to do this to the
> notebook version.
>
>
>> What other big datasets are you thinking of here? That was
>> all that came to mind, but the Tutorial has a lot in it.
>
>
> I will report this as I go through the tutorial conversion, but for now:
>
> 1. As you say, the uniprot file

This was already using the smaller uniprot_spot.dat file rather than
the larger uniprot_trembl.dat, see:

ftp://ftp.uniprot.org/pub/databases/uniprot/current_release/knowledgebase/complete/

We could include the links to the current release and to a specific
older and smaller release which would be used for the live testing?

ftp://ftp.uniprot.org/pub/databases/uniprot/previous_releases/

It would require careful wording to avoid confusion...

> 2. This code on the same chapter:
> #this will download the files - Currently there are more than 16, but we
> will do only 4
> import os
> for i in range(1, 5):
>     os.system('wget ftp://ftp.ncbi.nih.gov/genbank/gbvrl%i.seq.gz -O
> data/gbvrl%i.seq.gz' % (i, i))
>     os.system('gzip -d data/gbvrl%i.seq.gz' % i)
>
> Currently, the latex tutorial will download 16 files. These are not complete
> the whole current database, and lookup code afterwards will fail (i.e. there
> is a bug in the tutorial).
>
> I opted to download only 4 files (as per above) and to make sure that I get
> a very early key example.

The tutorial was mean to get you to download all the chunks (16 at time
of writing), but I like your idea about picking a record near the start
so this can be done with just the first few chunks. Already logged as
https://github.com/biopython/biopython/issues/714

Peter