[Bioperl-l] Get GIs from Taxonomy ID

armendarez77 at hotmail.com armendarez77 at hotmail.com
Fri Jun 11 05:08:22 UTC 2010


Thank you for all of the choices.   The 'zgrep' command works a treat.


Veronica


> From: Russell.Smithies at agresearch.co.nz
> To: cjfields at illinois.edu
> CC: David.Messina at sbc.su.se; armendarez77 at hotmail.com; bioperl-l at lists.open-bio.org
> Date: Fri, 11 Jun 2010 09:43:55 +1200
> Subject: RE: [Bioperl-l] Get GIs from Taxonomy ID
> 
> That's the way I usually do it as NCBI/eUtils can be a bit flakey.
> Not BioPerls fault of course ;-)
> 
>     zgrep -w 9940 gi_taxid_nucl.dmp.gz | awk '{print $1}'
> 
> 
> 
> --Russell
> 
> > -----Original Message-----
> > From: Chris Fields [mailto:cjfields at illinois.edu]
> > Sent: Friday, 11 June 2010 9:36 a.m.
> > To: Smithies, Russell
> > Cc: 'Dave Messina'; 'armendarez77 at hotmail.com'; 'bioperl-l at lists.open-
> > bio.org'
> > Subject: Re: [Bioperl-l] Get GIs from Taxonomy ID
> > 
> > You can get up-to-date files mapping GI to TaxID here (nr and nt):
> > 
> > ftp://ftp.ncbi.nlm.nih.gov/pub/taxonomy/
> > 
> > chris
> > 
> > On Jun 10, 2010, at 4:11 PM, Smithies, Russell wrote:
> > 
> > > Eutils will do it with the right query:
> > >
> > >
> > >
> > >
> > > use Bio::DB::EUtilities;
> > >
> > > my $factory = Bio::DB::EUtilities->new(-eutil => 'esearch',
> > >                       -db => 'nucleotide',
> > >                       -term => 'txid9940[Organism:noexp]',
> > >                       -email => 'mymail at foo.bar',
> > >                       -retmax => 1000000);
> > >
> > > # query hits
> > > print "Count = ",$factory->get_count,"\n";
> > > # UIDs
> > > my @ids = $factory->get_ids;
> > >
> > >
> > > --Russell
> > >
> > >
> > >> -----Original Message-----
> > >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> > >> bounces at lists.open-bio.org] On Behalf Of Dave Messina
> > >> Sent: Friday, 11 June 2010 8:01 a.m.
> > >> To: armendarez77 at hotmail.com
> > >> Cc: bioperl-l at lists.open-bio.org
> > >> Subject: Re: [Bioperl-l] Get GIs from Taxonomy ID
> > >>
> > >> Hi Veronica,
> > >>
> > >> These days when you run BLAST at the NCBI server, you're running
> > BLAST+,
> > >> which is their complete rewrite of (and replacement for) BLAST.
> > >>
> > >> You can also download BLAST+ and do pretty much everything on your
> > local
> > >> machine that you can do on their server, including limit by taxonomy.
> > >>
> > >> I think this is the right parameter:
> > >>
> > >> 	http://www.ncbi.nlm.nih.gov/blast/html/blastcgihelp.html#entrez_que
> > >> ry
> > >>
> > >> Incidentally, BLAST+ has this awesome feature whereby you can, from the
> > >> command line, run searches remotely on their server against their
> > >> databases from the command line, just by adding the --remote flag.
> > >>
> > >>
> > >> (You can run BLAST+ via the BioPerl wrapper module StandAloneBlastPlus,
> > by
> > >> the way.)
> > >>
> > >> Dave
> > >>
> > >>
> > >>
> > >> On Jun 10, 2010, at 4:54 PM, <armendarez77 at hotmail.com> wrote:
> > >>
> > >>>
> > >>> Hello,
> > >>>
> > >>> Is there a BioPerl method that will give a list of GIs for a specified
> > >> NCBI taxonomy Id?
> > >>>
> > >>> I've previously tried using Urlapi to BLAST primers against the nr
> > >> database on the NCBI server, but recently I keep getting a
> > >>> 'Bad Gateway' error.  While my system admin is looking into this, I've
> > >>> decided to go another route.  Therefore, I've downloaded the NCBI nr
> > >> database.
> > >>>
> > >>> The problem I've run into is restricting the BLAST against the nr
> > >> database to a subset of sequences.  The NCBI Blast tools have an option
> > (-
> > >> l) that does this, but it requires a list of GI's.
> > >>>
> > >>> When I was using Urlapi, I restricted sequences using Taxonomy Ids
> > >> (Entrez Query).  Therefore, is there a way to get all GIs within a
> > >> Taxonomy Id?  I've seen that woth Bio::Taxonomy I can give a GI and get
> > a
> > >> Tax Id, but not the reverse.
> > >>>
> > >>>
> > >>> Thank you,
> > >>>
> > >>> Veronica
> > >>>
> > >>>
> > >>>
> > >>>
> > >>> _________________________________________________________________
> > >>> Hotmail is redefining busy with tools for the New Busy. Get more from
> > >> your inbox.
> > >>>
> > >>
> > http://www.windowslive.com/campaign/thenewbusy?ocid=PID28326::T:WLMTAGL:ON
> > >> :WL:en-US:WM_HMP:042010_2
> > >>> _______________________________________________
> > >>> Bioperl-l mailing list
> > >>> Bioperl-l at lists.open-bio.org
> > >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > >>
> > >>
> > >> _______________________________________________
> > >> Bioperl-l mailing list
> > >> Bioperl-l at lists.open-bio.org
> > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > > =======================================================================
> > > Attention: The information contained in this message and/or attachments
> > > from AgResearch Limited is intended only for the persons or entities
> > > to which it is addressed and may contain confidential and/or privileged
> > > material. Any review, retransmission, dissemination or other use of, or
> > > taking of any action in reliance upon, this information by persons or
> > > entities other than the intended recipients is prohibited by AgResearch
> > > Limited. If you have received this message in error, please notify the
> > > sender immediately.
> > > =======================================================================
> > >
> > > _______________________________________________
> > > Bioperl-l mailing list
> > > Bioperl-l at lists.open-bio.org
> > > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
 		 	   		  
_________________________________________________________________
The New Busy is not the too busy. Combine all your e-mail accounts with Hotmail.
http://www.windowslive.com/campaign/thenewbusy?tile=multiaccount&ocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_4



More information about the Bioperl-l mailing list