[Biopython] Fetching fasta sequences by accession number

Iddo Friedberg idoerg at gmail.com
Fri Apr 20 18:44:44 UTC 2018


Uniprot has several APIs to access it:
http://www.uniprot.org/help/programmatic_access but I am not sure there is
a module in biopython that accesses that. But it should be easy to do, just
use a script to retrieve this generic URL:

https://www.uniprot.org/uniprot/P12345.fasta

where "P12345" is replaced by whatever UniprotID you have.


Then you can upload your concatenated FASTA file to the HMMER site, I am
not sure what their size limitation is.

If it's only swissprot you are interested in, and you have the disk space,
I suggest you download it, download HMMER and whatever reference databases
you wish to run against, and do it all locally. Especially if you have a
large number of sequences to process. Biopython can read swissprot or XML
uniprot files via SeqIO.

To download uniprot: http://www.uniprot.org/downloads


On Fri, Apr 20, 2018 at 1:11 PM, Ahmad Abdelzaher <underoath006 at gmail.com>
wrote:

> Thank you for the reply. The accession numbers that I want to fetch the
> fasta sequences for are uniprot accession numbers. I want to search for
> homologs for these sequences on HMMR: https://www.ebi.ac.uk/Tools/hmmer/
>
> Any suggestions on how to do so?
>
> On Fri, Apr 20, 2018 at 10:16 AM, Iddo Friedberg <idoerg at gmail.com> wrote:
>
>> There is an example here for downloading multiple GenBank entries:
>> http://biopython.org/DIST/docs/tutorial/Tutorial.html#htoc131
>>
>> Depending on the actual database you are downloading from, you can use
>> rettype="fasta" or convert a genbank file to fasta as in here:
>> http://biopython.org/wiki/Converting_sequence_files
>>
>> The possible rettype and retmode  are  dependent on the database you are
>> fetching from, and determined of the efetch API . More about that here:
>> https://www.ncbi.nlm.nih.gov/books/NBK25499/#chapter4.EFetch
>>
>> HTH,
>>
>> Iddo
>>
>>
>> On Fri, Apr 20, 2018 at 6:25 AM, Ahmad Abdelzaher <underoath006 at gmail.com
>> > wrote:
>>
>>> How can I batch download fasta sequences by accession number? Is there a
>>> Biopython method that can do that? Any other suggestions or alternatives?
>>>
>>> Regards.
>>>
>>> _______________________________________________
>>> Biopython mailing list  -  Biopython at mailman.open-bio.org
>>> http://mailman.open-bio.org/mailman/listinfo/biopython
>>>
>>
>>
>>
>> --
>> Iddo Friedberg
>> http://iddo-friedberg.net/contact.html
>> ++++++++++[>+++>++++++>++++++++>++++++++++>+++++++++++<<<<<-]>>>>++++.>
>> ++++++..----.<<<<++++++++++++++++++++++++++++.-----------..>>>+.-----.
>> .>-.<<<<--.>>>++.>+++.<+++.----.-.<++++++++++++++++++.>+.>.<++.<<<+.>>
>> >>----.<--.>++++++.<<<<------------------------------------.
>>
>
>


-- 
Iddo Friedberg
http://iddo-friedberg.net/contact.html
++++++++++[>+++>++++++>++++++++>++++++++++>+++++++++++<<<<<-]>>>>++++.>
++++++..----.<<<<++++++++++++++++++++++++++++.-----------..>>>+.-----.
.>-.<<<<--.>>>++.>+++.<+++.----.-.<++++++++++++++++++.>+.>.<++.<<<+.>>
>>----.<--.>++++++.<<<<------------------------------------.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.open-bio.org/pipermail/biopython/attachments/20180420/5ad38f94/attachment-0001.html>


More information about the Biopython mailing list