[Bioperl-l] get geneID for gene names
Smithies, Russell
Russell.Smithies at agresearch.co.nz
Sun May 6 20:20:49 UTC 2012
Looks like it's the $name that has the trailing new-line and I suspect one cause might be your file of gene names is in Windows format. General practise is to put a "chomp" in while doing reads to remove these. I'd also recommend "use strict;" and "use warnings;" in your headers as it simplifies development and prevents simple mistakes creeping in.
Eg.
#!/bin/perl
use warnings;
use strict;
use Bio::DB::EUtilities;
open (OUT, "> geneID_list");
open (OUT2, "> genename_ID_list");
while (<>){
chomp;
$name = $_;
If you have a lot of queries to make (i.e. >10,000) it might be easier to download the geneinfo list and grep the data out of that.
ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/
--Russell
From: Hermann Norpois [mailto:hnorpois at googlemail.com]
Sent: Saturday, 5 May 2012 12:10 a.m.
To: Smithies, Russell
Cc: bioperl-l at lists.open-bio.org
Subject: Re: [Bioperl-l] get geneID for gene names
Thank you. I am very happy with -db `gene'. Originally I thought -db unists was less ambigious. I combined the suggestions. So my script is:
#!/bin/perl
use Bio::DB::EUtilities;
open (OUT, "> geneID_list");
open (OUT2, "> genename_ID_list");
while (<>)
{
$name = $_;
my $factory = Bio::DB::EUtilities->new(-eutil => 'esearch',
-db => 'gene',
-term => "$name [Gene Name] AND Mus musculus [Organism]",
-email => 'hnorpois at mpipsykl.mpg.de<mailto:hnorpois at mpipsykl.mpg.de>',
);
my @ids = $factory->get_ids;
# print "$name\t at ids[0]\n";
my $geneids = join(',', at ids); # For the case there is more than one ID.
print "Fetching GENEID\t$geneids for GENE NAME\t$name\n";
print OUT "$geneids\n";
# my %name_id = ($name =>$geneids);
print OUT2 "$geneids\t$name";
}
But there still is something I do not understand. It is not important but ... $geneids seems to include "\n". Because this is what I get on the screen:
Fetching GENEID 54161 for GENE NAME copg
Fetching GENEID 12064 for GENE NAME bdnf
Fetching GENEID 71661 for GENE NAME 0610005C13RIK
Fetching GENEID 382908 for GENE NAME LOC382908
Fetching GENEID 54633 for GENE NAME PQBP1
Fetching GENEID 258908 for GENE NAME MOR154-1
Thanks
Hermann Norpois
2012/5/3 Smithies, Russell <Russell.Smithies at agresearch.co.nz<mailto:Russell.Smithies at agresearch.co.nz>>
If you're looking for gene information, why are you searching UniSTS?
Unless I've overlooked something, wouldn't it be more useful to search the "gene" database and tighten up your query a bit?
#!/bin/perl
use strict;
use warnings;
use Bio::DB::EUtilities;
my $factory = Bio::DB::EUtilities->new(
-eutil => 'esearch',
-db => 'gene',
-term => '(copg[Gene Name]) AND mouse[Organism]',
-email => 'hnorpois at mpipsykl.mpg.de<mailto:hnorpois at mpipsykl.mpg.de>',
-usehistory => 'y'
);
my $hist = $factory->next_History || die "No history data returned";
$factory->set_parameters(
-eutil => 'efetch',
-history => $hist
);
print $factory->get_Response->content;
--Russell
-----Original Message-----
From: bioperl-l-bounces at lists.open-bio.org<mailto:bioperl-l-bounces at lists.open-bio.org> [mailto:bioperl-l-bounces at lists.open-bio.org<mailto:bioperl-l-bounces at lists.open-bio.org>] On Behalf Of Hermann Norpois
Sent: Thursday, 3 May 2012 9:01 a.m.
To: Fields, Christopher J
Cc: <bioperl-l at lists.open-bio.org<mailto:bioperl-l at lists.open-bio.org>>
Subject: Re: [Bioperl-l] get geneID for gene names
Thank you very much. But there still is a problem.
This is my output:
525211,210532,167498,142652
I get some ids (the first one is the UniSTS ID, the following ... I do not
know) but there is no gene ID. If you compare to the following link:
http://www.ncbi.nlm.nih.gov/genome/sts/sts.cgi?uid=525211 The gene ID should be 54161<http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=gene&cmd=Retrieve&dopt=Graphics&list_uids=54161>
.
This is my (your) script:
#!/bin/perl -w
use Bio::DB::EUtilities;
my $name = "Copg";
my $factory = Bio::DB::EUtilities->new(-eutil => 'esearch',
-db => 'unists',
-term => "$name AND Mus musculus
[ORGN]",
-email => 'hnorpois at mpipsykl.mpg.de<mailto:hnorpois at mpipsykl.mpg.de>',
);
print join(',',$factory->get_ids)."\n";
2012/5/2 Fields, Christopher J <cjfields at illinois.edu<mailto:cjfields at illinois.edu>>
> Also, a small but very significant bug is in the below. Can you spot it?
>
> The '-term' value is in single quotes, these need to be double-quotes
> to interpolate $name. Otherwise, it is literally looking for '$name'.
>
> chris
>
> On May 2, 2012, at 12:55 PM, Christopher Fields wrote:
>
> > Hermann,
> >
> > The below works for me (note I'm using esearch, not efetch). To
> actually get the records you will use efetch and the IDs obtained below.
> >
> > chris
> >
> > ------------------------------
> > my $name = "Copg";
> > my $factory = Bio::DB::EUtilities->new(-eutil => 'esearch',
> > -db => 'unists',
> > -term => '$name AND mouse [ORGN]',
> > -email => '<EMAIL_HERE>',
> > );
> >
> > print join(',',$factory->get_ids)."\n";
> >
> >
> > On May 2, 2012, at 12:42 PM, Hermann Norpois wrote:
> >
> >> Hello,
> >>
> >> I wish to get gene IDs for gene names (e.g. bdnf, copg). I thought
> >> it
> was a
> >> good idea to use Bio::DB::EUtilities (see below) and addressed
> >> UNISTS as database because there it was quite easy to find the gene
> >> ID. So far I
> was
> >> unable to retrieve the gene ID from UNISTS. Could anybody give me a
> >> hint how to proceed? The cookbook ... Yes, I was trying.
> >>
> >> #!/bin/perl -w
> >>
> >> use Bio::DB::EUtilities;
> >>
> >> my $name = "Copg";
> >> my $factory = Bio::DB::EUtilities->new(-eutil => 'efetch',
> >> -db => 'unists',
> >> -term => '$name AND mouse
> [ORGN]',
> >> -email => '
> hnorpois at mpipsykl.mpg.de<mailto:hnorpois at mpipsykl.mpg.de>'
> >> )
> >>
> >>
> >> Thank you
> >> Hermann Norpois
> >> _______________________________________________
> >> Bioperl-l mailing list
> >> Bioperl-l at lists.open-bio.org<mailto:Bioperl-l at lists.open-bio.org>
> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
>
>
_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org<mailto:Bioperl-l at lists.open-bio.org>
http://lists.open-bio.org/mailman/listinfo/bioperl-l
=======================================================================
Attention: The information contained in this message and/or attachments
from AgResearch Limited is intended only for the persons or entities
to which it is addressed and may contain confidential and/or privileged
material. Any review, retransmission, dissemination or other use of, or
taking of any action in reliance upon, this information by persons or
entities other than the intended recipients is prohibited by AgResearch
Limited. If you have received this message in error, please notify the
sender immediately.
=======================================================================
More information about the Bioperl-l
mailing list