[Bioperl-l] Bio::DB::EntrezGene or Bio::DB::Query::GenBank to obtain sequence metadata without sequence
Smithies, Russell
Russell.Smithies at agresearch.co.nz
Sun Oct 11 19:46:59 UTC 2009
I guess it depends on what you've got to start with, how many queries, and which species.
For example, if you want metadata on all human genes, I'd probably do it "manually" from NCBI's website by searching the gene database for "human[orgn]", switching to "gene table" view, then save to file.
It gives you an easily parsed text file with contents as below:
------------------------------------------
1: TGFB1 transforming growth factor, beta 1 [ Homo sapiens ]
GeneID: 7040 updated 07-Oct-2009
RefSeq status: REVIEWED
total gene size: 23166 bp
mRNA bp exons Protein aa exons
NM_000660.3 2346 7 NP_000651.3 390 7
Exon information:
NM_000660.3 length: 2346 bp, number of exons: 7
NP_000651.3 length: 390 aa, number of exons: 7
EXON Coding EXON INTRON
coords length coords length coords length
1 - 1222 1222 bp 868 - 1222 355 bp 1223 - 5456 4234 bp
5457 - 5617 161 bp 5457 - 5617 161 bp 5618 - 9047 3430 bp
9048 - 9165 118 bp 9048 - 9165 118 bp 9166 - 11664 2499 bp
11665 - 11742 78 bp 11665 - 11742 78 bp 11743 - 11881 139 bp
11882 - 12029 148 bp 11882 - 12029 148 bp 12030 - 21630 9601 bp
21631 - 21784 154 bp 21631 - 21784 154 bp 21785 - 22701 917 bp
22702 - 23166 465 bp 22702 - 22860 159 bp
------------------------------------------
Or you could try using Bio::DB::Eutilities, specifying 'gene' as the database and 'table' as the retype.
I'm not sure what retypes are allowed under B:D:E but it should be in the docs.
Take a look at http://www.bioperl.org/wiki/Getting_Genomic_Sequences or http://www.bioperl.org/wiki/HOWTO:EUtilities_Cookbook
Hope this helps,
Russell Smithies
Bioinformatics Applications Developer
T +64 3 489 9085
E russell.smithies at agresearch.co.nz
Invermay Research Centre
Puddle Alley,
Mosgiel,
New Zealand
T +64 3 489 3809
F +64 3 489 9174
www.agresearch.co.nz
> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Dan Kortschak
> Sent: Friday, 9 October 2009 7:54 p.m.
> To: bioperl-l at lists.open-bio.org
> Subject: [Bioperl-l] Bio::DB::EntrezGene or Bio::DB::Query::GenBank to obtain
> sequence metadata without sequence
>
> Hi,
>
> I am looking to query NCBI for sequence metadata (LOCUS/length,
> DEFINITION/name etc) without obtaining the sequence associated with the
> entry (pulling sequence data for chromosome when only the metadata is
> needed is a waste).
>
> I'm wondering what would be the most appropriate bioperl module to use -
> Bio::DB::EntrezGene or Bio::DB::Query::GenBank seem like the best bet
> and from the description the latter seems best, but I'm wondering if
> this is best and what database would both provide this data and be
> parsable.
>
> thanks for any help.
> Dan
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
=======================================================================
Attention: The information contained in this message and/or attachments
from AgResearch Limited is intended only for the persons or entities
to which it is addressed and may contain confidential and/or privileged
material. Any review, retransmission, dissemination or other use of, or
taking of any action in reliance upon, this information by persons or
entities other than the intended recipients is prohibited by AgResearch
Limited. If you have received this message in error, please notify the
sender immediately.
=======================================================================
More information about the Bioperl-l
mailing list