[BioPython] Need help to get Fasta sequence of Gis !

Andrew Dalke dalke at dalkescientific.com
Thu Apr 8 17:15:37 EDT 2004


Jonathan Boulais:
> Hi everyone !
> I'm a newbie to Biopython

Welcome!

> and I would like to get the fasta sequences of a huge list of Gis. Any 
> suggestions ?

How huge?  At some point it's better to just download GenBank and get 
the
data straight from there.

If it's small enough (10,000 or fewer records?), then look at the
Bio.EUtils client.

 >>> from Bio import EUtils
 >>> from Bio.EUtils import ThinClient
 >>> client = ThinClient.ThinClient()
 >>> dbids = EUtils.DBIds("protein", ["914034", "5263173", "1769808", 
"1060883"])
 >>> f = client.efetch_using_dbids(dbids, retmode = "text", rettype = 
"fasta")
 >>> print f.read()
 >gi|914034|gb|AAB32951.1| cruxrhodopsin-2 [Haloarcula]
MLQSGMSTYVPGGESIFLWVGTAGMFLGMLYFIARGWSVSDQRRQKFYIATIMIAAIAFVNYLSMALGFG
VTTIELGGEERAIYWARYTDWLFTTPLLLYDLALLAGADRNTIYSLVGLDVLMIGTGALATLSAGSGVLP
AGAERLVWWGISTGFLLVLLYFLFSNLTDRASELSGDLQSKFSTLRNLVLVLWLVYPVLWLVGTEGLGLV
GLPIETAAFMVLDLTAKIGFGIILLQSHAVLDEGQTASEGAAVAD

 >gi|5263173|dbj|BAA81816.1| cruxrhodopsin [Haloarcula japonica]
MPEPGSEAIWLWLGTAGMFLGMLYFIGRGWGETDSRRQKFYIATILITAIAFVNYLAMALGFGLTIVEFA
GEEHPIYWARYSDWLFTTPLLLYDLGLLAGADRNTIASLVSLDVLMIGTGLVATLSAGSGVLSAGAERLV
WWGISTAFLLVLLYFLFSSLSGRVADLPSDTRSTFKTLRNLVTVVWLVYPVWWLIGTEGLGLVGIGIETA
GFMVIDLTAKVGFGIILLRSHGVLDGAAETTGAGATATAD

 >gi|1769808|dbj|BAA06680.1| cruxrhodopsin-3 [Haloarcula vallismortis]
MPAPEGEAIWLWLGTAGMFLGMLYFIARGWGETDSRRQKFYIATILITAIAFVNYLAMALGFGLTIVEIA
GEQRPIYWARYSDWLFTTPLLLYDLGLLAGADRNTISSLVSLDVLMIGTGLVATLSAGSGVLSAGAERLV
WWGISTAFLLVLLYFLFSSLSGRVADLPSDTRSTFKTLRNLVTVVWLVYPVWWLVGTEGIGLVGIGIETA
GFMVIDLVAKVGFGIILLRSHGVLDGAAETTGAGATATAD

 >gi|1060883|dbj|BAA06678.1| cruxrhodopsin-1 [Haloarcula argentinensis]
MPEPGSEAIWLWLGTAGMFLGMLYFIARGWGETDSRRQKFYIATILITAIAFVNYLAMALGFGLTIVEFA
GEEHPIYWARYSDWLFTTPLLLYDLGLLAGADRNTITSLVSLDVLMIGTGLVATLSPGSGVLSAGAERLV
WWGISTAFLLVLLYFLFSSLSGRVADLPSDTRSTFKTLRNLVTVVWLVYPVWWLIGTEGIGLVGIGIETA
GFMVIDLTAKVGFGIILLRSHGVLDGAAETTGTGATPADD


I'm working a cleanup of EUtils to make some of the machinery
disappear.  I expect the result will let you do

import EUtils
f = EUtils.efetch("protein", ["914034", "5263173", "1769808", 
"1060883"],
                   format = "fasta")
print f.read()

Is anyone here using EUtils?  I would like to see some code which
uses it, to make sure I don't break things and to see if I can
improve the API.

					Andrew
					dalke at dalkescientific.com



More information about the BioPython mailing list