[Bioperl-l] GenBank gene field
Alexander Kozik
akozik at atgc.org
Fri Jan 21 19:21:15 EST 2005
Please take a look on two sample records from GenBank files (Arabidopsis
and C.elegans)
C.elegans file has "/gene" entries for both "gene" and "CDS" fields.
Arabidopsis file has no "/gene" entries at all.
Previous version of Arabidopsis GenBank file was with "/gene" entries.
Could you help to understand why it happens and what entry you suggest
to extract if user is interested in extraction of corresponding gene names.
Do I use terms "entry" and "field" properly?
Thanks a lot in advance,
Alexander Kozik
Bioinformatics Specialist
Genome and Biomedical Sciences Facility
451 East Health Sciences Drive
University of California
Davis, CA 95616-8816
Phone: (530) 754-9127
email: akozik at atgc.org
web: http://www.atgc.org/
----
Arabidopsis GenBank file NC_003070.gbk:
gene complement(38753..40944)
/locus_tag="At1g01070"
/note="synonym: T25K16.7; nodulin MtN21 family protein"
/db_xref="GeneID:839550"
...
CDS complement(join(38898..39054,39136..39287,39409..39814,
40213..40329,40473..40535,40675..40877))
/locus_tag="At1g01070"
/note="similar to MtN21 GI:2598575 (root nodule
development) from [Medicago truncatula]"
/codon_start=1
/protein_id="NP_563617.1"
/db_xref="GI:18378792"
/db_xref="GeneID:839550"
/translation="MAG...
----
C.elegans GenBank file NC_003279.gbk:
gene 43733..44677
/gene="1A519"
/locus_tag="1A519"
/synonym="Y74C9A.1"
/note="Title: Caenorhabditis elegans expressed gene
1A519."
...
CDS
join(43733..43961,44030..44234,44281..44328,44521..44677)
/gene="1A519"
/locus_tag="1A519"
/codon_start=1
/product="putative protein (1A519)"
/protein_id="17510627"
/db_xref="GI:17510627"
...
More information about the Bioperl-l
mailing list