[Biopython-dev] support for database of BOLDSYSTEMS?

Thu May 21 10:21:56 UTC 2015

Dear Biopythoneers,

Are there any users of the BOLD Barcode of Life Data) System
who'd be interested to use or test some code from Carlos Peña?

I was just reminded of his open pull request by an endorsement from
Eric Ma on GitHub... https://github.com/biopython/biopython/pull/438

(For any technical comments, please reply just to the development
list or on GitHub, rather than the main discussion list.)

Thanks,

Peter

On Wed, Dec 10, 2014 at 4:51 PM, Travis Wrightsman <twrig002 at ucr.edu> wrote:
> It might be best to contact the general list as well to see if anyone
> has used BOLD before. I visited the website for a few minutes today,
> it seems to be a data aggregator that offers taxonomic metadata.
>
> -Travis
>
>> On Dec 10, 2014, at 6:31 AM, Peter Cock <p.j.a.cock at googlemail.com> wrote:
>>
>> Dear Biopythoneers,
>>
>> For those of you not following GitHub's pull request notifications,
>> recent Biopython contributor Carlos Peña has submitted his code
>> for the BOLD (Barcode of Life Data) System for possible inclusion
>> in Biopython (email included below), see:
>> https://github.com/biopython/biopython/pull/438
>>
>> I'm hoping someone on the list has used BOLD before, see
>> http://www.boldsystems.org/ - and could give some feedback
>> please?
>>
>> Or should we need to ask on the main mailing list?
>>
>> Thanks,
>>
>> Peter
>>
>> ---------- Forwarded message ----------
>> From: Carlos Peña <notifications at github.com>
>> Date: Wed, Dec 3, 2014 at 2:48 PM
>> Subject: [biopython] Proposal of new Biopython module: bold (#438)
>> To: biopython/biopython <biopython at noreply.github.com>
>>
>>
>> As I mentioned in an email to the dev list some time ago, I have been
>> working on module to perform calls to the BOLD database via their API.
>> The BOLD database contains more than 1 million public DNA barcode
>> sequences (part of the COI gene). One of the most interesting services
>> is the possibility of sending the barcode sequence and retrieving the
>> taxon identification and more metadata from the BOLD servers.
>>
>> I just migrated the code to Biopython from a temporal Github
>> repository. You can see the documentation here
>> https://bold.readthedocs.org/en/latest/usage.html that covers all the
>> API methods provided by BOLD.
>>
>> This module includes unittests for 99% coverage. The tests and
>> docstrings have been tested in Python 2.6, 2.7, 3.3, 3.4 and pypy.
>>
>> I completed all the work that I could think of, hence the pull
>> request. I am open to feedback on this.
>>
>> ________________________________
>>
>> You can merge this Pull Request by running
>>
>>  git pull https://github.com/carlosp420/biopython patch-30
>>
>> Or view, comment on, or merge it at:
>>
>>  https://github.com/biopython/biopython/pull/438
>>
>> Commit Summary
>>
>> copy code in Biopython
>> added Experimental Warning
>> added tests
>>
>> File Changes
>>
>> A Bio/bold/__init__.py (33)
>> A Bio/bold/api.py (684)
>> A Bio/bold/utils.py (32)
>> A Tests/test_bold_api.py (261)
>> A Tests/test_bold_utils.py (40)
>> M setup.py (1)
>>
>> Patch Links:
>>
>> https://github.com/biopython/biopython/pull/438.patch
>> https://github.com/biopython/biopython/pull/438.diff
>>
>> —
>> Reply to this email directly or view it on GitHub.
>>
>>
>>
>>
>>
>>
>>> On Wed, Nov 5, 2014 at 10:45 AM, Peter Cock <p.j.a.cock at googlemail.com> wrote:
>>> Hi Carlos,
>>>
>>> I've not done anything with Twisted or other asynchronous mechanism
>>> for accessing online resources - services like the NCBI discourage
>>> submitting multiple requests in parallel anyway.
>>>
>>> One idea might be to leave that to the library's user, and focus on the
>>> lower level API (building the URLs, parsing the returned values, etc)?
>>>
>>> Peter
>>>
>>>
>>>> On Tue, Nov 4, 2014 at 8:31 PM, Carlos Peña <mycalesis at gmail.com> wrote:
>>>> Hi all,
>>>>
>>>>
>>>> I have written an interface to the BOLD database of DNA barcodes. It accepts
>>>> FASTA files, sends them to BOLD and gets the specimen identifications to the
>>>> species level:
>>>>
>>>> https://github.com/carlosp420/bold_retriever
>>>>
>>>> I was wondering whether it could be included into BioPython? So far the
>>>> packages is a bunch of scripts and I want to make it more robust.
>>>> The working version is not so efficient as the running time has exponential
>>>> growth (n squared).
>>>>
>>>> However, I was able to use asynchronous calls (using Twisted) to make it
>>>> faster. The script was able to take (n) seconds for (n) number of sequences.
>>>> But I don't fully understand Twisted and the package is unstable.
>>>>
>>>> So, I wanted to ask if this little project of mine has any hope of getting
>>>> into BioPython. If that is the case I would need some pointers on using
>>>> proper classes for the code and fixing the code so that it can be
>>>> integrated. I guess I would need to drop Twisted and use instead a standard
>>>> Python library for multithreading.
>>>>
>>>> I want to improve the package anyways, make it more robust and quick. So I
>>>> wanted to ask before giving another chance to Twisted.
>>>>
>>>> Any comments would be appreciated,
>>>>
>>>>
>>>> carlos
>>>>
>>>>
>>>> Dr. Carlos Peña
>>>> Laboratory of Genetics
>>>> Department of Biology
>>>> University of Turku
>>>> 20014 Turku
>>>> FINLAND
>>>>
>>>>
>>>> _______________________________________________
>>>> Biopython-dev mailing list
>>>> Biopython-dev at mailman.open-bio.org
>>>> http://mailman.open-bio.org/mailman/listinfo/biopython-dev
>>
>> _______________________________________________
>> Biopython-dev mailing list
>> Biopython-dev at mailman.open-bio.org
>> http://mailman.open-bio.org/mailman/listinfo/biopython-dev