[Bioperl-l] Bio::DB::EntrezGene or Bio::DB::Query::GenBank to obtain sequence metadata without sequence

Sun Oct 11 19:46:59 UTC 2009

I guess it depends on what you've got to start with, how many queries, and which species.
For example, if you want metadata on all human genes, I'd probably do it "manually" from NCBI's website by searching the gene database for "human[orgn]", switching to "gene table" view, then save to file.
It gives you an easily parsed text file with contents as below:
------------------------------------------
1: TGFB1 transforming growth factor, beta 1 [ Homo sapiens ] 
GeneID: 7040 updated 07-Oct-2009
RefSeq status: REVIEWED
total gene size: 23166 bp

mRNA   bp   exons   Protein   aa   exons
NM_000660.3   2346   7   NP_000651.3   390   7

Exon information:

NM_000660.3 length: 2346 bp, number of exons: 7
NP_000651.3 length: 390 aa, number of exons: 7

EXON      Coding EXON      INTRON
coords   length      coords   length      coords   length
1 - 1222   1222 bp  868 - 1222   355 bp  1223 - 5456   4234 bp
5457 - 5617   161 bp  5457 - 5617   161 bp  5618 - 9047   3430 bp
9048 - 9165   118 bp  9048 - 9165   118 bp  9166 - 11664   2499 bp
11665 - 11742   78 bp  11665 - 11742   78 bp  11743 - 11881   139 bp
11882 - 12029   148 bp  11882 - 12029   148 bp  12030 - 21630   9601 bp
21631 - 21784   154 bp  21631 - 21784   154 bp  21785 - 22701   917 bp
22702 - 23166   465 bp  22702 - 22860   159 bp

------------------------------------------

Or you could try using Bio::DB::Eutilities, specifying 'gene' as the database and 'table' as the retype.
I'm not sure what retypes are allowed under B:D:E but it should be in the docs.

Take a look at http://www.bioperl.org/wiki/Getting_Genomic_Sequences or http://www.bioperl.org/wiki/HOWTO:EUtilities_Cookbook

Hope this helps,

Russell Smithies 

Bioinformatics Applications Developer 
T +64 3 489 9085 
E  russell.smithies at agresearch.co.nz 

Invermay  Research Centre 
Puddle Alley, 
Mosgiel, 
New Zealand 
T  +64 3 489 3809   
F  +64 3 489 9174  
www.agresearch.co.nz 

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Dan Kortschak
> Sent: Friday, 9 October 2009 7:54 p.m.
> To: bioperl-l at lists.open-bio.org
> Subject: [Bioperl-l] Bio::DB::EntrezGene or Bio::DB::Query::GenBank to obtain
> sequence metadata without sequence
> 
> Hi,
> 
> I am looking to query NCBI for sequence metadata (LOCUS/length,
> DEFINITION/name etc) without obtaining the sequence associated with the
> entry (pulling sequence data for chromosome when only the metadata is
> needed is a waste).
> 
> I'm wondering what would be the most appropriate bioperl module to use -
> Bio::DB::EntrezGene or Bio::DB::Query::GenBank seem like the best bet
> and from the description the latter seems best, but I'm wondering if
> this is best and what database would both provide this data and be
> parsable.
> 
> thanks for any help.
> Dan
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
=======================================================================
Attention: The information contained in this message and/or attachments
from AgResearch Limited is intended only for the persons or entities
to which it is addressed and may contain confidential and/or privileged
material. Any review, retransmission, dissemination or other use of, or
taking of any action in reliance upon, this information by persons or
entities other than the intended recipients is prohibited by AgResearch
Limited. If you have received this message in error, please notify the
sender immediately.
=======================================================================