[Bioperl-l] get geneID for gene names

Smithies, Russell Russell.Smithies at agresearch.co.nz
Sun May 6 20:20:49 UTC 2012


Looks like it's the $name that has the trailing new-line and I suspect one cause might be your file of gene names is in Windows format. General practise is to put a "chomp" in while doing reads to remove these. I'd also recommend "use strict;" and "use warnings;" in your headers as it simplifies development and prevents simple mistakes creeping in.
Eg.

#!/bin/perl

use warnings;
use strict;

use Bio::DB::EUtilities;

open (OUT, "> geneID_list");
open (OUT2, "> genename_ID_list");

while (<>){
   chomp;
   $name = $_;

If you have a lot of queries to make (i.e. >10,000) it might be easier to download the geneinfo list and grep the data out of that.
ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/

--Russell


From: Hermann Norpois [mailto:hnorpois at googlemail.com]
Sent: Saturday, 5 May 2012 12:10 a.m.
To: Smithies, Russell
Cc: bioperl-l at lists.open-bio.org
Subject: Re: [Bioperl-l] get geneID for gene names

Thank you. I am very happy with -db `gene'. Originally I thought -db unists was less ambigious. I combined the suggestions. So my script is:

#!/bin/perl

use Bio::DB::EUtilities;

open (OUT, "> geneID_list");
open (OUT2, "> genename_ID_list");

while (<>)
  {
   $name = $_;


 my $factory = Bio::DB::EUtilities->new(-eutil  => 'esearch',
                                     -db     => 'gene',
                                     -term   => "$name [Gene Name] AND Mus musculus   [Organism]",
                                     -email  => 'hnorpois at mpipsykl.mpg.de<mailto:hnorpois at mpipsykl.mpg.de>',
                                     );

 my @ids = $factory->get_ids;
# print "$name\t at ids[0]\n";
 my $geneids = join(',', at ids);           # For the case there is more than one ID.
 print "Fetching GENEID\t$geneids for GENE NAME\t$name\n";
 print OUT "$geneids\n";
# my %name_id = ($name =>$geneids);
 print OUT2 "$geneids\t$name";
   }

But there still is something I do not understand. It is not important but ... $geneids seems to include "\n". Because this is what I get on the screen:

Fetching GENEID    54161 for GENE NAME    copg

Fetching GENEID    12064 for GENE NAME    bdnf

Fetching GENEID    71661 for GENE NAME    0610005C13RIK

Fetching GENEID    382908 for GENE NAME    LOC382908

Fetching GENEID    54633 for GENE NAME    PQBP1

Fetching GENEID    258908 for GENE NAME    MOR154-1

Thanks
Hermann Norpois

2012/5/3 Smithies, Russell <Russell.Smithies at agresearch.co.nz<mailto:Russell.Smithies at agresearch.co.nz>>
If you're looking for gene information, why are you searching UniSTS?
Unless I've overlooked something, wouldn't it be more useful to search the "gene" database and tighten up your query a bit?

#!/bin/perl
use strict;
use warnings;

use Bio::DB::EUtilities;

my $factory = Bio::DB::EUtilities->new(
   -eutil      => 'esearch',
   -db         => 'gene',
   -term       => '(copg[Gene Name]) AND mouse[Organism]',
   -email      => 'hnorpois at mpipsykl.mpg.de<mailto:hnorpois at mpipsykl.mpg.de>',
   -usehistory => 'y'
);

my $hist = $factory->next_History || die "No history data returned";

$factory->set_parameters(
   -eutil   => 'efetch',
   -history => $hist
);

print $factory->get_Response->content;


--Russell

-----Original Message-----
From: bioperl-l-bounces at lists.open-bio.org<mailto:bioperl-l-bounces at lists.open-bio.org> [mailto:bioperl-l-bounces at lists.open-bio.org<mailto:bioperl-l-bounces at lists.open-bio.org>] On Behalf Of Hermann Norpois
Sent: Thursday, 3 May 2012 9:01 a.m.
To: Fields, Christopher J
Cc: <bioperl-l at lists.open-bio.org<mailto:bioperl-l at lists.open-bio.org>>
Subject: Re: [Bioperl-l] get geneID for gene names

Thank you very much. But there still is a problem.

This is my output:
525211,210532,167498,142652

I get some ids (the first one is the UniSTS ID, the following ... I do not
know) but there is no gene ID. If you compare to the following link:
http://www.ncbi.nlm.nih.gov/genome/sts/sts.cgi?uid=525211 The gene ID should be 54161<http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=gene&cmd=Retrieve&dopt=Graphics&list_uids=54161>
.

This is my (your) script:

#!/bin/perl -w

use Bio::DB::EUtilities;

my $name = "Copg";
my $factory = Bio::DB::EUtilities->new(-eutil  => 'esearch',
                                    -db     => 'unists',
                                    -term   => "$name AND Mus musculus
[ORGN]",
                                    -email  => 'hnorpois at mpipsykl.mpg.de<mailto:hnorpois at mpipsykl.mpg.de>',
                                    );

print join(',',$factory->get_ids)."\n";



2012/5/2 Fields, Christopher J <cjfields at illinois.edu<mailto:cjfields at illinois.edu>>

> Also, a small but very significant bug is in the below.  Can you spot it?
>
> The '-term' value is in single quotes, these need to be double-quotes
> to interpolate $name.  Otherwise, it is literally looking for '$name'.
>
> chris
>
> On May 2, 2012, at 12:55 PM, Christopher Fields wrote:
>
> > Hermann,
> >
> > The below works for me (note I'm using esearch, not efetch).  To
> actually get the records you will use efetch and the IDs obtained below.
> >
> > chris
> >
> > ------------------------------
> > my $name = "Copg";
> > my $factory = Bio::DB::EUtilities->new(-eutil  => 'esearch',
> >                                      -db     => 'unists',
> >                                      -term   => '$name AND mouse [ORGN]',
> >                                      -email  => '<EMAIL_HERE>',
> >                                      );
> >
> > print join(',',$factory->get_ids)."\n";
> >
> >
> > On May 2, 2012, at 12:42 PM, Hermann Norpois wrote:
> >
> >> Hello,
> >>
> >> I wish to get gene IDs for gene names (e.g. bdnf, copg). I thought
> >> it
> was a
> >> good idea to use Bio::DB::EUtilities (see below) and addressed
> >> UNISTS as database because there it was quite easy to find the gene
> >> ID. So far I
> was
> >> unable to retrieve the gene ID from UNISTS. Could anybody give me a
> >> hint how to proceed? The cookbook ... Yes, I was trying.
> >>
> >> #!/bin/perl -w
> >>
> >> use Bio::DB::EUtilities;
> >>
> >> my $name = "Copg";
> >> my $factory = Bio::DB::EUtilities->new(-eutil  => 'efetch',
> >>                                      -db     => 'unists',
> >>                                      -term   => '$name AND mouse
> [ORGN]',
> >>                                      -email  => '
> hnorpois at mpipsykl.mpg.de<mailto:hnorpois at mpipsykl.mpg.de>'
> >>                                      )
> >>
> >>
> >> Thank you
> >> Hermann Norpois
> >> _______________________________________________
> >> Bioperl-l mailing list
> >> Bioperl-l at lists.open-bio.org<mailto:Bioperl-l at lists.open-bio.org>
> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
>
>
_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org<mailto:Bioperl-l at lists.open-bio.org>
http://lists.open-bio.org/mailman/listinfo/bioperl-l
=======================================================================
Attention: The information contained in this message and/or attachments
from AgResearch Limited is intended only for the persons or entities
to which it is addressed and may contain confidential and/or privileged
material. Any review, retransmission, dissemination or other use of, or
taking of any action in reliance upon, this information by persons or
entities other than the intended recipients is prohibited by AgResearch
Limited. If you have received this message in error, please notify the
sender immediately.
=======================================================================





More information about the Bioperl-l mailing list