[BioPython] NCBIDictionary and genome database
Tiago Antão
tiagoantao at gmail.com
Fri Jan 26 16:16:52 UTC 2007
Hi,
It works. I would just ask if it would make sense to include other
databases (popset comes to my mind)?
http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=Popset
Other parts of the code seem to support this particular one.
Regards,
Tiago
On 1/25/07, Michiel Jan Laurens de Hoon <mdehoon at c2b2.columbia.edu> wrote:
> Hi Tiago,
>
> I updated Biopython in CVS with your code in the places where I think
> they are supposed to go. Could you check this new code to make sure it
> still works? You would have to download these to files from CVS:
>
> Bio/GenBank/__init__.py (revision 1.65)
> Bio/dbdefs/genbank.py (revision 1.6)
>
> With these two files, the following should work:
>
> >>> parser = GenBank.FeatureParser()
> >>> ncbi_dict = GenBank.NCBIDictionary('genome', 'genbank', parser=parser)
> >>> res = GenBank.search_for('txid8292[orgn]', 'genome')
> >>> gb_entry = ncbi_dict[res[0]]
>
> --Michiel.
>
> Tiago Antão wrote:
> > Hi,
> >
> > I am trying to download complete genomes, not nuclear but
> > mithocondrial (~17000 bps each).
> > For instance:
> >
> > parser = GenBank.FeatureParser()
> > ncbi_dict = GenBank.NCBIDictionary('nucleotide', 'genbank', parser=parser)
> > ncbi_dict.db = genome_genbank_eutils
> > res = GenBank.search_for('txid8292[orgn]', 'genome')
> > gb_entry = ncbi_dict[res[0]]
> >
> > In this case I am searching_for all amphibian genomes query: txid8292[orgn]
> > Or, using the web:
> > http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=8292&lvl=0
> > And Choose "Genome Sequences" on the right (73):
> > http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=Genome&cmd=Search&dopt=DocSum&term=txid8292[Organism:exp]
> >
> >
> >
> > On 1/25/07, Michiel Jan Laurens de Hoon <mdehoon at c2b2.columbia.edu> wrote:
> >> Hi Tiago,
> >>
> >> Which genbank record are you trying to download?
> >> Just so I can replicate the problem and try your workaround.
> >>
> >> --Michiel
> >>
> >> Tiago Antão wrote:
> >> > Hi!
> >> >
> >> > Just a question regarding accessing NCBI genome database from
> >> NCBIDictionary:
> >> > In the code there is:
> >> > class NCBIDictionary:
> >> > """Access GenBank using a read-only dictionary interface.
> >> > """
> >> > VALID_DATABASES = ['nucleotide', 'protein']
> >> > That is, genome is not a valid one.
> >> > Is there a reason for that?
> >> >
> >> > BTW, I have the following workaround (which might be good or bad...):
> >> >
> >> > from Bio import GenBank
> >> > from Bio.config.DBRegistry import EUtilsDB, DBGroup
> >> > from Bio.dbdefs.genbank import ncbi_failures
> >> > from Bio import db
> >> >
> >> > genome_genbank_eutils = EUtilsDB(
> >> > name = "genome-genbank-eutils",
> >> > doc = "Retrieve genome GenBank sequences from NCBI using
> >> EUtils",
> >> > delay = 5.0,
> >> > db = "genome",
> >> > rettype = "gb",
> >> > failure_cases = ncbi_failures
> >> > )
> >> >
> >> >
> >> > ncbi_dict = GenBank.NCBIDictionary('nucleotide', 'genbank')
> >> > ncbi_dict.db = genome_genbank_eutils
> >> >
> >> > Regards,
> >> > Tiago
> >>
> >>
> >> --
> >> Michiel de Hoon
> >> Center for Computational Biology and Bioinformatics
> >> Columbia University
> >> 1130 St Nicholas Avenue
> >> New York, NY 10032
> >>
> >
> >
>
>
> --
> Michiel de Hoon
> Center for Computational Biology and Bioinformatics
> Columbia University
> 1130 St Nicholas Avenue
> New York, NY 10032
>
--
Blog (português) http://balderikstraat.blogspot.com/
More information about the Biopython
mailing list