[Biopython-dev] support for database of BOLDSYSTEMS?

Peter Cock p.j.a.cock at googlemail.com
Wed Dec 10 14:31:21 UTC 2014


Dear Biopythoneers,

For those of you not following GitHub's pull request notifications,
recent Biopython contributor Carlos Peña has submitted his code
for the BOLD (Barcode of Life Data) System for possible inclusion
in Biopython (email included below), see:
https://github.com/biopython/biopython/pull/438

I'm hoping someone on the list has used BOLD before, see
http://www.boldsystems.org/ - and could give some feedback
please?

Or should we need to ask on the main mailing list?

Thanks,

Peter

---------- Forwarded message ----------
From: Carlos Peña <notifications at github.com>
Date: Wed, Dec 3, 2014 at 2:48 PM
Subject: [biopython] Proposal of new Biopython module: bold (#438)
To: biopython/biopython <biopython at noreply.github.com>


As I mentioned in an email to the dev list some time ago, I have been
working on module to perform calls to the BOLD database via their API.
The BOLD database contains more than 1 million public DNA barcode
sequences (part of the COI gene). One of the most interesting services
is the possibility of sending the barcode sequence and retrieving the
taxon identification and more metadata from the BOLD servers.

I just migrated the code to Biopython from a temporal Github
repository. You can see the documentation here
https://bold.readthedocs.org/en/latest/usage.html that covers all the
API methods provided by BOLD.

This module includes unittests for 99% coverage. The tests and
docstrings have been tested in Python 2.6, 2.7, 3.3, 3.4 and pypy.

I completed all the work that I could think of, hence the pull
request. I am open to feedback on this.

________________________________

You can merge this Pull Request by running

  git pull https://github.com/carlosp420/biopython patch-30

Or view, comment on, or merge it at:

  https://github.com/biopython/biopython/pull/438

Commit Summary

copy code in Biopython
added Experimental Warning
added tests

File Changes

A Bio/bold/__init__.py (33)
A Bio/bold/api.py (684)
A Bio/bold/utils.py (32)
A Tests/test_bold_api.py (261)
A Tests/test_bold_utils.py (40)
M setup.py (1)

Patch Links:

https://github.com/biopython/biopython/pull/438.patch
https://github.com/biopython/biopython/pull/438.diff

—
Reply to this email directly or view it on GitHub.






On Wed, Nov 5, 2014 at 10:45 AM, Peter Cock <p.j.a.cock at googlemail.com> wrote:
> Hi Carlos,
>
> I've not done anything with Twisted or other asynchronous mechanism
> for accessing online resources - services like the NCBI discourage
> submitting multiple requests in parallel anyway.
>
> One idea might be to leave that to the library's user, and focus on the
> lower level API (building the URLs, parsing the returned values, etc)?
>
> Peter
>
>
> On Tue, Nov 4, 2014 at 8:31 PM, Carlos Peña <mycalesis at gmail.com> wrote:
>> Hi all,
>>
>>
>> I have written an interface to the BOLD database of DNA barcodes. It accepts
>> FASTA files, sends them to BOLD and gets the specimen identifications to the
>> species level:
>>
>> https://github.com/carlosp420/bold_retriever
>>
>> I was wondering whether it could be included into BioPython? So far the
>> packages is a bunch of scripts and I want to make it more robust.
>> The working version is not so efficient as the running time has exponential
>> growth (n squared).
>>
>> However, I was able to use asynchronous calls (using Twisted) to make it
>> faster. The script was able to take (n) seconds for (n) number of sequences.
>> But I don't fully understand Twisted and the package is unstable.
>>
>> So, I wanted to ask if this little project of mine has any hope of getting
>> into BioPython. If that is the case I would need some pointers on using
>> proper classes for the code and fixing the code so that it can be
>> integrated. I guess I would need to drop Twisted and use instead a standard
>> Python library for multithreading.
>>
>> I want to improve the package anyways, make it more robust and quick. So I
>> wanted to ask before giving another chance to Twisted.
>>
>> Any comments would be appreciated,
>>
>>
>> carlos
>>
>>
>> Dr. Carlos Peña
>> Laboratory of Genetics
>> Department of Biology
>> University of Turku
>> 20014 Turku
>> FINLAND
>>
>>
>> _______________________________________________
>> Biopython-dev mailing list
>> Biopython-dev at mailman.open-bio.org
>> http://mailman.open-bio.org/mailman/listinfo/biopython-dev



More information about the Biopython-dev mailing list