[Bioperl-l] fetch gene sequence with EUtilities.pm

Chris Fields cjfields at illinois.edu
Wed Jun 10 13:20:43 UTC 2009


EntrezGene doesn't contain the sequence information; I believe it just  
links to the sequence in a specified nuc record with given  
coordinates.  You can get to it, but it takes a little trickery; in  
essence you need to use the UID to get the gene summary information,  
extract that, then grab the sequence record using seqstart, seqend,  
and seqstrand.

A dump of esummary info for UID 18131, for instance, (using $eutil- 
 >print_all) gives this info (abbreviated somewhat):

UID                 :18131
Name                :Notch3
Description         :Notch gene homolog 3 (Drosophila)
Orgname             :Mus musculus
...
GenomicInfo
     GenomicInfoType
         ChrLoc      :17
         ChrAccVer   :NC_000083.5
         ChrStart    :32303796
         ChrStop     :32257837
GeneWeight          :23049

The genomic info section gives the accession.version, start, end, and  
(implicitly) the strand (ChrStop is less that ChrStart). I have added  
an example to the cookbook:

http://www.bioperl.org/wiki/HOWTO:EUtilities_Cookbook#How_do_I_retrieve_the_DNA_sequence_using_EntrezGene_IDs.3F

chris

On Jun 9, 2009, at 6:20 AM, Adam Witney wrote:

> Hi,
>
> I have been experimenting with the Bio::DB::EUtilities module, with  
> help from the Cookbook. But I can't seem to figure out how to get  
> the DNA sequence of a gene; all the examples seem to be fetching  
> protein sequence.
>
> How would i go about fetching a sequence using an Entrez GeneID?
>
> thanks for any help
>
> adam
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l




More information about the Bioperl-l mailing list