[Bioperl-l] Fwd: [Genbank-bb] Change to sequence display formats : Removal of GIs by June 2016
Fields, Christopher J
cjfields at illinois.edu
Fri Jun 26 23:46:19 UTC 2015
Something to keep in mind if parsing breaks (though we should be okay). I’m more concerned about BLAST+ XML changes...
chris
Begin forwarded message:
From: "Cavanaugh, Mark (NIH/NLM/NCBI) [E]" <cavanaug at ncbi.nlm.nih.gov<mailto:cavanaug at ncbi.nlm.nih.gov>>
To: "'genbankb at net.bio.net<mailto:genbankb at net.bio.net>' (genbankb at net.bio.net<mailto:genbankb at net.bio.net>)" <genbankb at magpie.bio.indiana.edu<mailto:genbankb at magpie.bio.indiana.edu>>
Date: June 26, 2015 at 5:13:50 PM CDT
Subject: [Genbank-bb] Change to sequence display formats : Removal of GIs by June 2016
Greetings GenBank Users,
A very significant change which impacts the GenBank, GenPept, and FASTA
display formats for sequence records at NCBI was announced in the June 2015
GenBank release notes : The removal of GI sequence identifiers.
This change could have many impacts, so it seems prudent to announce it
independently, to ensure that as many users are aware of the change as
possible. So Section 1.4.1 of the June release notes are reproduced below.
Mark Cavanaugh
GenBank
NCBI/NLM/NIH/HHS
1.4.1 GI sequence identifiers to be removed from GenBank/GenPept/FASTA formats
As of 06/15/2016, the integer sequence identifiers known as "GIs" will no
longer be included in the GenBank, GenPept, and FASTA formats supported by
NCBI for the display of sequence records.
As first described in the Release Notes for GenBank 199.0 in December 2013,
NCBI is in the process of moving to storage solutions which utilize only
Accession.Version identifiers. See Section 1.4.2 of these release notes for
additional background information about those developments.
Although GI sequence identifiers served their purpose well for many years,
the Accession.Version system is completely equivalent (and much more
human-readable).
And given the shift to non-GI-based systems, the importance of using
Accession.Version identifiers cannot be overstated. So as an initial step, NCBI
will cease the display of GI identifiers in the flatfile and FASTA views of
all sequence records.
Previously-assigned GI identifiers will continue to exist 'behind the scenes',
and NCBI services (including URLs, APIs, etc) which accept GIs as inputs/arguments
will be supported, for those sequence records that have GIs, for the foreseeable
future.
Over the next year NCBI will identify all such services that do not yet
support Accession.Version identifiers, and add that support. Users of those
services will then be encouraged to make use of Accession.Version rather than GIs.
Of course, for those services that already support Accession.Version, NCBI
encourages users to begin transitioning away from GI as soon as is practical.
In the sample record below, nucleotide sequence AF123456 has been assigned a
GI of 6633795, and the protein translation of its coding region feature has
been assigned a GI of 6633796 :
LOCUS AF123456 1510 bp mRNA linear VRT 12-APR-2012
DEFINITION Gallus gallus doublesex and mab-3 related transcription factor 1
(DMRT1) mRNA, partial cds.
ACCESSION AF123456
VERSION AF123456.2 GI:6633795
....
CDS <1..936
/gene="DMRT1"
/note="cDMRT1"
/codon_start=1
/product="doublesex and mab-3 related transcription factor
1"
/protein_id="AAF19666.1"
/db_xref="GI:6633796"
/translation="PAAGKKLPRLPKCARCRNHGYSSPLKGHKRFCMWRDCQCKKCSL
IAERQRVMAVQVALRRQQAQEEELGISHPVPLPSAPEPVVKKSSSSSSCLLQDSSSPA
HSTSTVAAAAASAPPEGRMLIQDIPSIPSRGHLESTSDLVVDSTYYSSFYQPSLYPYY
NNLYNYSQYQMAVATESSSSETGGTFVGSAMKNSLRSLPATYMSSQSGKQWQMKGMEN
RHAMSSQYRMCSYYPPTSYLGQGVGSPTCVTQILASEDTPSYSESKARVFSPPSSQDS
GLGCLSSSESTKGDLECEPHQEPGAFAVSPVLEGE"
After June 15 2016, the GI value on the VERSION line and the GI /db_xref
qualifier for the coding region feature will no longer be displayed:
LOCUS AF123456 1510 bp mRNA linear VRT 12-APR-2012
DEFINITION Gallus gallus doublesex and mab-3 related transcription factor 1
(DMRT1) mRNA, partial cds.
ACCESSION AF123456
VERSION AF123456.2
....
CDS <1..936
/gene="DMRT1"
/note="cDMRT1"
/codon_start=1
/product="doublesex and mab-3 related transcription factor
1"
/protein_id="AAF19666.1"
/translation="PAAGKKLPRLPKCARCRNHGYSSPLKGHKRFCMWRDCQCKKCSL
IAERQRVMAVQVALRRQQAQEEELGISHPVPLPSAPEPVVKKSSSSSSCLLQDSSSPA
HSTSTVAAAAASAPPEGRMLIQDIPSIPSRGHLESTSDLVVDSTYYSSFYQPSLYPYY
NNLYNYSQYQMAVATESSSSETGGTFVGSAMKNSLRSLPATYMSSQSGKQWQMKGMEN
RHAMSSQYRMCSYYPPTSYLGQGVGSPTCVTQILASEDTPSYSESKARVFSPPSSQDS
GLGCLSSSESTKGDLECEPHQEPGAFAVSPVLEGE"
Similarly, the GI value will be removed from the VERSION line of the GenPept
format. Currently:
LOCUS AAF19666 311 aa linear VRT 12-APR-2012
DEFINITION doublesex and mab-3 related transcription factor 1, partial [Gallus
gallus].
ACCESSION AAF19666
VERSION AAF19666.1 GI:6633796
DBSOURCE accession AF123456.2
....
CDS 1..311
/gene="DMRT1"
/coded_by="AF123456.2:<1..936"
As of 06/15/2016:
LOCUS AAF19666 311 aa linear VRT 12-APR-2012
DEFINITION doublesex and mab-3 related transcription factor 1, partial [Gallus
gallus].
ACCESSION AAF19666
VERSION AAF19666.1
DBSOURCE accession AF123456.2
....
CDS 1..311
/gene="DMRT1"
/coded_by="AF123456.2:<1..936"
Note that the coding region feature for GenPept format has never included
the display of nucleotide GI values.
For FASTA format, GI values will be removed from the FASTA header/defline:
Currently:
gi|6633795|gb|AF123456.2| Gallus gallus doublesex and mab-3 related transcription factor 1 (DMRT1) mRNA, partial cds
CCGGCGGCGGGCAAGAAGCTGCCGCGTCTGCCCAAGTGTGCCCGCTGCCGCAACCACGGCTACTCCTCGC
CGCTGAAGGGGCACAAGCGGTTCTGCATGTGGCGGGACTGCCAGTGCAAGAAGTGCAGCCTGATCGCCGA
[....]
gi|6633796|gb|AAF19666.1| doublesex and mab-3 related transcription factor 1, partial
[Gallus gallus]
PAAGKKLPRLPKCARCRNHGYSSPLKGHKRFCMWRDCQCKKCSLIAERQRVMAVQVALRRQQAQEEELGI
SHPVPLPSAPEPVVKKSSSSSSCLLQDSSSPAHSTSTVAAAAASAPPEGRMLIQDIPSIPSRGHLESTSD
LVVDSTYYSSFYQPSLYPYYNNLYNYSQYQMAVATESSSSETGGTFVGSAMKNSLRSLPATYMSSQSGKQ
WQMKGMENRHAMSSQYRMCSYYPPTSYLGQGVGSPTCVTQILASEDTPSYSESKARVFSPPSSQDSGLGC
LSSSESTKGDLECEPHQEPGAFAVSPVLEGE
As of 06/15/2016:
gb|AF123456.2| Gallus gallus doublesex and mab-3 related transcription factor 1 (DMRT1) mRNA, partial cds
CCGGCGGCGGGCAAGAAGCTGCCGCGTCTGCCCAAGTGTGCCCGCTGCCGCAACCACGGCTACTCCTCGC
CGCTGAAGGGGCACAAGCGGTTCTGCATGTGGCGGGACTGCCAGTGCAAGAAGTGCAGCCTGATCGCCGA
[....]
gb|AAF19666.1| doublesex and mab-3 related transcription factor 1, partial
[Gallus gallus]
PAAGKKLPRLPKCARCRNHGYSSPLKGHKRFCMWRDCQCKKCSLIAERQRVMAVQVALRRQQAQEEELGI
SHPVPLPSAPEPVVKKSSSSSSCLLQDSSSPAHSTSTVAAAAASAPPEGRMLIQDIPSIPSRGHLESTSD
LVVDSTYYSSFYQPSLYPYYNNLYNYSQYQMAVATESSSSETGGTFVGSAMKNSLRSLPATYMSSQSGKQ
WQMKGMENRHAMSSQYRMCSYYPPTSYLGQGVGSPTCVTQILASEDTPSYSESKARVFSPPSSQDSGLGC
LSSSESTKGDLECEPHQEPGAFAVSPVLEGE
Please direct any inquiries about these changes to the NCBI Service Desk:
info at ncbi.nlm.nih.gov<mailto:info at ncbi.nlm.nih.gov>
_______________________________________________
Genbankb mailing list
Genbankb at net.bio.net<mailto:Genbankb at net.bio.net>
http://www.bio.net/biomail/listinfo/genbankb
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.open-bio.org/pipermail/bioperl-l/attachments/20150626/c1d798e0/attachment.html>
More information about the Bioperl-l
mailing list