[Bioperl-l] get geneID for gene names
Hermann Norpois
hnorpois at googlemail.com
Fri May 4 12:09:52 UTC 2012
Thank you. I am very happy with -db `gene'. Originally I thought -db unists
was less ambigious. I combined the suggestions. So my script is:
#!/bin/perl
use Bio::DB::EUtilities;
open (OUT, "> geneID_list");
open (OUT2, "> genename_ID_list");
while (<>)
{
$name = $_;
my $factory = Bio::DB::EUtilities->new(-eutil => 'esearch',
-db => 'gene',
-term => "$name [Gene Name] AND Mus
musculus [Organism]",
-email => 'hnorpois at mpipsykl.mpg.de',
);
my @ids = $factory->get_ids;
# print "$name\t at ids[0]\n";
my $geneids = join(',', at ids); # For the case there is more than
one ID.
print "Fetching GENEID\t$geneids for GENE NAME\t$name\n";
print OUT "$geneids\n";
# my %name_id = ($name =>$geneids);
print OUT2 "$geneids\t$name";
}
But there still is something I do not understand. It is not important but
... $geneids seems to include "\n". Because this is what I get on the
screen:
Fetching GENEID 54161 for GENE NAME copg
Fetching GENEID 12064 for GENE NAME bdnf
Fetching GENEID 71661 for GENE NAME 0610005C13RIK
Fetching GENEID 382908 for GENE NAME LOC382908
Fetching GENEID 54633 for GENE NAME PQBP1
Fetching GENEID 258908 for GENE NAME MOR154-1
Thanks
Hermann Norpois
2012/5/3 Smithies, Russell <Russell.Smithies at agresearch.co.nz>
> If you're looking for gene information, why are you searching UniSTS?
> Unless I've overlooked something, wouldn't it be more useful to search the
> "gene" database and tighten up your query a bit?
>
> #!/bin/perl
> use strict;
> use warnings;
>
> use Bio::DB::EUtilities;
>
> my $factory = Bio::DB::EUtilities->new(
> -eutil => 'esearch',
> -db => 'gene',
> -term => '(copg[Gene Name]) AND mouse[Organism]',
> -email => 'hnorpois at mpipsykl.mpg.de',
> -usehistory => 'y'
> );
>
> my $hist = $factory->next_History || die "No history data returned";
>
> $factory->set_parameters(
> -eutil => 'efetch',
> -history => $hist
> );
>
> print $factory->get_Response->content;
>
>
> --Russell
>
> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:
> bioperl-l-bounces at lists.open-bio.org] On Behalf Of Hermann Norpois
> Sent: Thursday, 3 May 2012 9:01 a.m.
> To: Fields, Christopher J
> Cc: <bioperl-l at lists.open-bio.org>
> Subject: Re: [Bioperl-l] get geneID for gene names
>
> Thank you very much. But there still is a problem.
>
> This is my output:
> 525211,210532,167498,142652
>
> I get some ids (the first one is the UniSTS ID, the following ... I do not
> know) but there is no gene ID. If you compare to the following link:
> http://www.ncbi.nlm.nih.gov/genome/sts/sts.cgi?uid=525211 The gene ID
> should be 54161<
> http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=gene&cmd=Retrieve&dopt=Graphics&list_uids=54161
> >
> .
>
> This is my (your) script:
>
> #!/bin/perl -w
>
> use Bio::DB::EUtilities;
>
> my $name = "Copg";
> my $factory = Bio::DB::EUtilities->new(-eutil => 'esearch',
> -db => 'unists',
> -term => "$name AND Mus musculus
> [ORGN]",
> -email => 'hnorpois at mpipsykl.mpg.de',
> );
>
> print join(',',$factory->get_ids)."\n";
>
>
>
> 2012/5/2 Fields, Christopher J <cjfields at illinois.edu>
>
> > Also, a small but very significant bug is in the below. Can you spot it?
> >
> > The '-term' value is in single quotes, these need to be double-quotes
> > to interpolate $name. Otherwise, it is literally looking for '$name'.
> >
> > chris
> >
> > On May 2, 2012, at 12:55 PM, Christopher Fields wrote:
> >
> > > Hermann,
> > >
> > > The below works for me (note I'm using esearch, not efetch). To
> > actually get the records you will use efetch and the IDs obtained below.
> > >
> > > chris
> > >
> > > ------------------------------
> > > my $name = "Copg";
> > > my $factory = Bio::DB::EUtilities->new(-eutil => 'esearch',
> > > -db => 'unists',
> > > -term => '$name AND mouse
> [ORGN]',
> > > -email => '<EMAIL_HERE>',
> > > );
> > >
> > > print join(',',$factory->get_ids)."\n";
> > >
> > >
> > > On May 2, 2012, at 12:42 PM, Hermann Norpois wrote:
> > >
> > >> Hello,
> > >>
> > >> I wish to get gene IDs for gene names (e.g. bdnf, copg). I thought
> > >> it
> > was a
> > >> good idea to use Bio::DB::EUtilities (see below) and addressed
> > >> UNISTS as database because there it was quite easy to find the gene
> > >> ID. So far I
> > was
> > >> unable to retrieve the gene ID from UNISTS. Could anybody give me a
> > >> hint how to proceed? The cookbook ... Yes, I was trying.
> > >>
> > >> #!/bin/perl -w
> > >>
> > >> use Bio::DB::EUtilities;
> > >>
> > >> my $name = "Copg";
> > >> my $factory = Bio::DB::EUtilities->new(-eutil => 'efetch',
> > >> -db => 'unists',
> > >> -term => '$name AND mouse
> > [ORGN]',
> > >> -email => '
> > hnorpois at mpipsykl.mpg.de'
> > >> )
> > >>
> > >>
> > >> Thank you
> > >> Hermann Norpois
> > >> _______________________________________________
> > >> Bioperl-l mailing list
> > >> Bioperl-l at lists.open-bio.org
> > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > >
> >
> >
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> =======================================================================
> Attention: The information contained in this message and/or attachments
> from AgResearch Limited is intended only for the persons or entities
> to which it is addressed and may contain confidential and/or privileged
> material. Any review, retransmission, dissemination or other use of, or
> taking of any action in reliance upon, this information by persons or
> entities other than the intended recipients is prohibited by AgResearch
> Limited. If you have received this message in error, please notify the
> sender immediately.
> =======================================================================
>
More information about the Bioperl-l
mailing list