[Bioperl-l] Refseq Version

Smithies, Russell Russell.Smithies at agresearch.co.nz
Sun Feb 7 20:59:28 UTC 2010


I should have known it would break the formatting :-(

Try this:

Release 1:June 30, 2003;Release Size: 4672871949 bases, 263588685 amino acids, 1061675 records
Release 2:October 21, 2003;Release Size: 2124 organisms, 7745398573 nucleotide bases, 286957682 amino acids, 1097404 records
Release 3:January 13, 2004;Release Size: 2218 organisms, 7992741222 nucleotide bases, 294647847 amino acids, 1101244 records
Release 4:March 24, 2004;Release Size: 2358 organisms, 8175128887 nucleotide bases, 318253841 amino acids, 1193457 records
Release 5:May 2 , 2004;Release Size: 2395 organisms, 8325515623 nucleotide bases, 337229387 amino acids, 1255613 records
Release 6:July 5, 2004;Release Size: 2467 organisms, 8696371716 nucleotide bases, 365446682 amino acids, 1367206 records
Release 7:September 12, 2004;Release Size: 2558 organisms, 21072808460 nucleotide bases, 405233619 amino acids, 1579579 records
Release 8:October 31, 2004;Release Size: 2645 organisms, 26814386658 nucleotide bases, 430300369 amino acids, 1709723 records
Release 9:January 9, 2005;Release Size:  2780 organisms, 36786975473 nucleotide bases, 470534907 amino acids, 1843944 records
Release 10:March 6, 2005;Release Size:2827 organisms, 36893741150 nucleotide bases, 482862858 amino acids, 1893478 records
Release 11:May 8, 2005;Release Size:2928 organisms, 39731702362 nucleotide bases, 507980644 amino acids, 2477893 records
Release 12:July 10, 2005;Release Size:2969 organisms, 43043256058 nucleotide bases, 608493108 amino acids, 2869675 records
Release 13:September 11, 2005;Release Size:3060 organisms, 44727484853 nucleotide bases, 686768902 amino acids, 3400773 records
Release 14:November 20, 2005;Release Size:3198 organisms, 47364955367 nucleotide bases, 763761075 amino acids, 3272776 records
Release 15:January 1, 2006;Release Size:3244 organisms, 52645441913 nucleotide bases, 810009733 amino acids, 3436263 records
Release 16:March 11, 2006;Release Size:3397 organisms, 56175443059 nucleotide bases, 887509001 amino acids, 3715260 records
Release 17:May 1, 2006;Release Size:3497 organisms, 62130037371 nucleotide bases, 927587669 amino acids, 3999859 records
Release 18:July 11, 2006;Release Size:3695 organisms, 70474041999 nucleotide bases, 974374765 amino acids, 4186692 records
Release 19:September 10, 2006;Release Size: 3774 organisms, 70694879544 nucleotide bases, 1012985077 amino acids, 4311543 records
Release 20:November 5, 2006;Release Size:3919 organisms, 72679681505 nucleotide bases, 1061797276 amino acids, 4567569 records
Release 21:January 6, 2007;Release Size:4079 organisms, 73864990566 nucleotide bases, 1144795927 amino acids, 4742335 records
Release 22:March 5, 2007;Release Size:4187 organisms, 82441128546 nucleotide bases, 1215085694 amino acids, 5207865 records
Release 23:May 8, 2007;Release Size:4300 organisms, 83148327110 nucleotide bases, 1291050995 amino acids, 5503385 records
Release 24:July 10, 2007;Release Size:4511 organisms, 89856995521 nucleotide bases, 1365916222 amino acids, 6073814 records
Release 25:September 11, 2007;Release Size:4646 organisms, 91265840843 nucleotide bases, 1470475398 amino acids, 6515132 records
Release 26:November 4, 2007;Release Size:4737 organisms, 99105705485 nucleotide bases, 1495032507 amino acids, 6698250 records
Release 27:January 6, 2008;Release Size:4926 organisms, 101059552113 nucleotide bases, 1556356987 amino acids, 7025715 records
Release 28:March 9, 2008;Release Size: 5059 organisms, 102051350525 nucleotide bases, 1770627427 amino acids, 7914560 records
Release 29:May 4, 2008;Release Size:5168 organisms, 104671101150 nucleotide bases, 1870214220 amino acids, 8376141 records
Release 30:July 7, 2008;Release Size:5395 organisms, 105074486709 nucleotide bases, 1913447691 amino acids, 8572852 records
Release 31:August 30, 2008;Release Size: 5513 organisms, 109214348591 nucleotide bases, 2026768719 amino acids, 9145702 records
Release 32:November 10, 2008;Release Size: 5726 organisms, 111122203221 nucleotide bases, 2089596746 amino acids, 9501764 records
Release 33:January 16, 2009;Release Size:7773 organisms, 116001583818 nucleotide bases, 2204073443 amino acids, 10325282 records
Release 34:March 6, 2009;Release Size: 8054 organisms, 111792574830 nucleotide bases, 2299682138 amino acids, 10021870 records
Release 35:May 4, 2009;Release Size: 8393 organisms, 113210655336 nucleotide bases, 2565199170 amino acids, 10993891 records
Release 36:July 2, 2009;Release Size: 8665 organisms, 117013741530 nucleotide bases, 2756884219 amino acids, 12141825 records
Release 37:September 3, 2009;Release Size: 9005 organisms, 119151229820 nucleotide bases, 2965450333 amino acids, 12941750 records
Release 38:November 7, 2009;Release Size: 9166 organisms, 119196622435 nucleotide bases, 3115246540 amino acids, 13436447 records



> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Smithies, Russell
> Sent: Monday, 8 February 2010 9:47 a.m.
> To: 'shalu sharma'
> Cc: 'bioperl-l at lists.open-bio.org'
> Subject: Re: [Bioperl-l] Refseq Version
>
> Release 39 was Jan 30 and according to the README releases only come out
> in odd months (January, March, May, July, September, November)
> The stats file is here: ftp://ftp.ncbi.nih.gov/refseq/release/release-
> statistics/RefSeq-release39.01232010.stats.txt
>
> The numbers of sequences between the fasta release and the pre-build blast
> databases seem to differ but I guess only NCBI can explain that.
> I can't see any way of extracting the release number from the pre-build
> blast databases (apart from the build date) but it might be worth asking
> NCBI if they'd include the information in future releases.
>
>
> FYI, here's the old release stats.
> (I wget'ed and grep'ed all the stats files)
>
> Release
>
> Date
>
> Year
>
> Organisms
>
> Nucleotide Bases
>
> Amino Acids
>
> Records
>
> 1
>
> Jun-30
>
> 2003
>
>             4,672,871,949
>
>             263,588,685
>
>           1,061,675
>
> 2
>
> Oct-21
>
> 2003
>
>         2,124
>
>             7,745,398,573
>
>             286,957,682
>
>           1,097,404
>
> 3
>
> Jan-13
>
> 2004
>
>         2,218
>
>             7,992,741,222
>
>             294,647,847
>
>           1,101,244
>
> 4
>
> Mar-24
>
> 2004
>
>         2,358
>
>             8,175,128,887
>
>             318,253,841
>
>           1,193,457
>
> 5
>
> May-02
>
> 2004
>
>         2,395
>
>             8,325,515,623
>
>             337,229,387
>
>           1,255,613
>
> 6
>
> Jul-05
>
> 2004
>
>         2,467
>
>             8,696,371,716
>
>             365,446,682
>
>           1,367,206
>
> 7
>
> Sep-12
>
> 2004
>
>         2,558
>
>           21,072,808,460
>
>             405,233,619
>
>           1,579,579
>
> 8
>
> Oct-31
>
> 2004
>
>         2,645
>
>           26,814,386,658
>
>             430,300,369
>
>           1,709,723
>
> 9
>
> Jan-09
>
> 2005
>
>         2,780
>
>           36,786,975,473
>
>             470,534,907
>
>           1,843,944
>
> 10
>
> Mar-06
>
> 2005
>
>         2,827
>
>           36,893,741,150
>
>             482,862,858
>
>           1,893,478
>
> 11
>
> May-08
>
> 2005
>
>         2,928
>
>           39,731,702,362
>
>             507,980,644
>
>           2,477,893
>
> 12
>
> Jul-10
>
> 2005
>
>         2,969
>
>           43,043,256,058
>
>             608,493,108
>
>           2,869,675
>
> 13
>
> Sep-11
>
> 2005
>
>         3,060
>
>           44,727,484,853
>
>             686,768,902
>
>           3,400,773
>
> 14
>
> Nov-20
>
> 2005
>
>         3,198
>
>           47,364,955,367
>
>             763,761,075
>
>           3,272,776
>
> 15
>
> Jan-01
>
> 2006
>
>         3,244
>
>           52,645,441,913
>
>             810,009,733
>
>           3,436,263
>
> 16
>
> Mar-11
>
> 2006
>
>         3,397
>
>           56,175,443,059
>
>             887,509,001
>
>           3,715,260
>
> 17
>
> May-01
>
> 2006
>
>         3,497
>
>           62,130,037,371
>
>             927,587,669
>
>           3,999,859
>
> 18
>
> Jul-11
>
> 2006
>
>         3,695
>
>           70,474,041,999
>
>             974,374,765
>
>           4,186,692
>
> 19
>
> Sep-10
>
> 2006
>
>         3,774
>
>           70,694,879,544
>
>          1,012,985,077
>
>           4,311,543
>
> 20
>
> Nov-05
>
> 2006
>
>         3,919
>
>           72,679,681,505
>
>          1,061,797,276
>
>           4,567,569
>
> 21
>
> Jan-06
>
> 2007
>
>         4,079
>
>           73,864,990,566
>
>          1,144,795,927
>
>           4,742,335
>
> 22
>
> Mar-05
>
> 2007
>
>         4,187
>
>           82,441,128,546
>
>          1,215,085,694
>
>           5,207,865
>
> 23
>
> May-08
>
> 2007
>
>         4,300
>
>           83,148,327,110
>
>          1,291,050,995
>
>           5,503,385
>
> 24
>
> Jul-10
>
> 2007
>
>         4,511
>
>           89,856,995,521
>
>          1,365,916,222
>
>           6,073,814
>
> 25
>
> Sep-11
>
> 2007
>
>         4,646
>
>           91,265,840,843
>
>          1,470,475,398
>
>           6,515,132
>
> 26
>
> Nov-04
>
> 2007
>
>         4,737
>
>           99,105,705,485
>
>          1,495,032,507
>
>           6,698,250
>
> 27
>
> Jan-06
>
> 2008
>
>         4,926
>
>          101,059,552,113
>
>          1,556,356,987
>
>           7,025,715
>
> 28
>
> Mar-09
>
> 2008
>
>         5,059
>
>          102,051,350,525
>
>          1,770,627,427
>
>           7,914,560
>
> 29
>
> May-04
>
> 2008
>
>         5,168
>
>          104,671,101,150
>
>          1,870,214,220
>
>           8,376,141
>
> 30
>
> Jul-07
>
> 2008
>
>         5,395
>
>          105,074,486,709
>
>          1,913,447,691
>
>           8,572,852
>
> 31
>
> Aug-30
>
> 2008
>
>         5,513
>
>          109,214,348,591
>
>          2,026,768,719
>
>           9,145,702
>
> 32
>
> Nov-10
>
> 2008
>
>         5,726
>
>          111,122,203,221
>
>          2,089,596,746
>
>           9,501,764
>
> 33
>
> Jan-16
>
> 2009
>
>         7,773
>
>          116,001,583,818
>
>          2,204,073,443
>
>         10,325,282
>
> 34
>
> Mar-06
>
> 2009
>
>         8,054
>
>          111,792,574,830
>
>          2,299,682,138
>
>         10,021,870
>
> 35
>
> May-04
>
> 2009
>
>         8,393
>
>          113,210,655,336
>
>          2,565,199,170
>
>         10,993,891
>
> 36
>
> Jul-02
>
> 2009
>
>         8,665
>
>          117,013,741,530
>
>          2,756,884,219
>
>         12,141,825
>
> 37
>
> Sep-03
>
> 2009
>
>         9,005
>
>          119,151,229,820
>
>          2,965,450,333
>
>         12,941,750
>
> 38
>
> Nov-07
>
> 2009
>
>         9,166
>
>          119,196,622,435
>
>          3,115,246,540
>
>         13,436,447
>
>
>
> --Russell
>
>
> From: shalu sharma [mailto:sharmashalu.bio at gmail.com]
> Sent: Saturday, 6 February 2010 3:56 a.m.
> To: Smithies, Russell
> Cc: bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] Refseq Version
>
> Hi Russell,
>                Thanks for your response.
> I am getting the number of sequence in the database but not the release
> number (like 38, 39).
> This is what i did:
>
> $ fastacmd -I -d /db/ncbiblast/refseq/refseq_protein
> Database: NCBI Protein Reference Sequences
>            7,585,993 sequences; 2,644,770,521 total letters
>
> File names:
> /db/ncbiblast/refseq/refseq_protein.00
>    Date: Jan 30, 2010  8:34 PM    Version: 4    Longest sequence: 36,805
> res
> /db/ncbiblast/refseq/refseq_protein.01
>    Date: Jan 30, 2010  8:34 PM    Version: 4    Longest sequence: 33,403
> res
> /db/ncbiblast/refseq/refseq_protein.02
>    Date: Jan 30, 2010  8:34 PM    Version: 4    Longest sequence: 15,830
> res
>
> I am still confuse that how i can get the release number. I know refseq 39
> was released on Jan 30, 2010 but i don't know how to confirm this. I also
> tried look refseq release file but was not able to get any thing.
>
> I would really appreciate if anyone can help me out with this.
>
> Thanks
> Shalu
>
> On Thu, Feb 4, 2010 at 6:39 PM, Smithies, Russell
> <Russell.Smithies at agresearch.co.nz<mailto:Russell.Smithies at agresearch.co.n
> z>> wrote:
> If you have access to the blast database, use fastacmd -I -d databasename
> Otherwise, it's usually at the bottom of your blast result.
>
> --Russell
>
> > -----Original Message-----
> > From: bioperl-l-bounces at lists.open-bio.org<mailto:bioperl-l-
> bounces at lists.open-bio.org> [mailto:bioperl-l-<mailto:bioperl-l->
> > bounces at lists.open-bio.org<mailto:bounces at lists.open-bio.org>] On Behalf
> Of shalu sharma
> > Sent: Friday, 5 February 2010 11:02 a.m.
> > To: bioperl-l at lists.open-bio.org<mailto:bioperl-l at lists.open-bio.org>
> > Subject: [Bioperl-l] Refseq Version
> >
> > Hi All,
> >       This is not a bioperl query.
> > Is there any way to check refseq version (release). Actually i am using
> > some
> > server to blast my sequences (blastall) against refseq. Is there any way
> i
> > can get the version information on the refseq database (from the blast
> > file
> > or directly from the database)?
> >
> > Thanks
> > Shalu
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org<mailto:Bioperl-l at lists.open-bio.org>
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> =======================================================================
> Attention: The information contained in this message and/or attachments
> from AgResearch Limited is intended only for the persons or entities
> to which it is addressed and may contain confidential and/or privileged
> material. Any review, retransmission, dissemination or other use of, or
> taking of any action in reliance upon, this information by persons or
> entities other than the intended recipients is prohibited by AgResearch
> Limited. If you have received this message in error, please notify the
> sender immediately.
> =======================================================================
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l




More information about the Bioperl-l mailing list