[Bioperl-l] Refseq Version

Smithies, Russell Russell.Smithies at agresearch.co.nz
Sun Feb 7 20:47:00 UTC 2010


Release 39 was Jan 30 and according to the README releases only come out in odd months (January, March, May, July, September, November)
The stats file is here: ftp://ftp.ncbi.nih.gov/refseq/release/release-statistics/RefSeq-release39.01232010.stats.txt

The numbers of sequences between the fasta release and the pre-build blast databases seem to differ but I guess only NCBI can explain that.
I can't see any way of extracting the release number from the pre-build blast databases (apart from the build date) but it might be worth asking NCBI if they'd include the information in future releases.


FYI, here's the old release stats.
(I wget'ed and grep'ed all the stats files)

Release

Date

Year

Organisms

Nucleotide Bases

Amino Acids

Records

1

Jun-30

2003

            4,672,871,949

            263,588,685

          1,061,675

2

Oct-21

2003

        2,124

            7,745,398,573

            286,957,682

          1,097,404

3

Jan-13

2004

        2,218

            7,992,741,222

            294,647,847

          1,101,244

4

Mar-24

2004

        2,358

            8,175,128,887

            318,253,841

          1,193,457

5

May-02

2004

        2,395

            8,325,515,623

            337,229,387

          1,255,613

6

Jul-05

2004

        2,467

            8,696,371,716

            365,446,682

          1,367,206

7

Sep-12

2004

        2,558

          21,072,808,460

            405,233,619

          1,579,579

8

Oct-31

2004

        2,645

          26,814,386,658

            430,300,369

          1,709,723

9

Jan-09

2005

        2,780

          36,786,975,473

            470,534,907

          1,843,944

10

Mar-06

2005

        2,827

          36,893,741,150

            482,862,858

          1,893,478

11

May-08

2005

        2,928

          39,731,702,362

            507,980,644

          2,477,893

12

Jul-10

2005

        2,969

          43,043,256,058

            608,493,108

          2,869,675

13

Sep-11

2005

        3,060

          44,727,484,853

            686,768,902

          3,400,773

14

Nov-20

2005

        3,198

          47,364,955,367

            763,761,075

          3,272,776

15

Jan-01

2006

        3,244

          52,645,441,913

            810,009,733

          3,436,263

16

Mar-11

2006

        3,397

          56,175,443,059

            887,509,001

          3,715,260

17

May-01

2006

        3,497

          62,130,037,371

            927,587,669

          3,999,859

18

Jul-11

2006

        3,695

          70,474,041,999

            974,374,765

          4,186,692

19

Sep-10

2006

        3,774

          70,694,879,544

         1,012,985,077

          4,311,543

20

Nov-05

2006

        3,919

          72,679,681,505

         1,061,797,276

          4,567,569

21

Jan-06

2007

        4,079

          73,864,990,566

         1,144,795,927

          4,742,335

22

Mar-05

2007

        4,187

          82,441,128,546

         1,215,085,694

          5,207,865

23

May-08

2007

        4,300

          83,148,327,110

         1,291,050,995

          5,503,385

24

Jul-10

2007

        4,511

          89,856,995,521

         1,365,916,222

          6,073,814

25

Sep-11

2007

        4,646

          91,265,840,843

         1,470,475,398

          6,515,132

26

Nov-04

2007

        4,737

          99,105,705,485

         1,495,032,507

          6,698,250

27

Jan-06

2008

        4,926

         101,059,552,113

         1,556,356,987

          7,025,715

28

Mar-09

2008

        5,059

         102,051,350,525

         1,770,627,427

          7,914,560

29

May-04

2008

        5,168

         104,671,101,150

         1,870,214,220

          8,376,141

30

Jul-07

2008

        5,395

         105,074,486,709

         1,913,447,691

          8,572,852

31

Aug-30

2008

        5,513

         109,214,348,591

         2,026,768,719

          9,145,702

32

Nov-10

2008

        5,726

         111,122,203,221

         2,089,596,746

          9,501,764

33

Jan-16

2009

        7,773

         116,001,583,818

         2,204,073,443

        10,325,282

34

Mar-06

2009

        8,054

         111,792,574,830

         2,299,682,138

        10,021,870

35

May-04

2009

        8,393

         113,210,655,336

         2,565,199,170

        10,993,891

36

Jul-02

2009

        8,665

         117,013,741,530

         2,756,884,219

        12,141,825

37

Sep-03

2009

        9,005

         119,151,229,820

         2,965,450,333

        12,941,750

38

Nov-07

2009

        9,166

         119,196,622,435

         3,115,246,540

        13,436,447



--Russell


From: shalu sharma [mailto:sharmashalu.bio at gmail.com]
Sent: Saturday, 6 February 2010 3:56 a.m.
To: Smithies, Russell
Cc: bioperl-l at lists.open-bio.org
Subject: Re: [Bioperl-l] Refseq Version

Hi Russell,
               Thanks for your response.
I am getting the number of sequence in the database but not the release number (like 38, 39).
This is what i did:

$ fastacmd -I -d /db/ncbiblast/refseq/refseq_protein
Database: NCBI Protein Reference Sequences
           7,585,993 sequences; 2,644,770,521 total letters

File names:
/db/ncbiblast/refseq/refseq_protein.00
   Date: Jan 30, 2010  8:34 PM    Version: 4    Longest sequence: 36,805 res
/db/ncbiblast/refseq/refseq_protein.01
   Date: Jan 30, 2010  8:34 PM    Version: 4    Longest sequence: 33,403 res
/db/ncbiblast/refseq/refseq_protein.02
   Date: Jan 30, 2010  8:34 PM    Version: 4    Longest sequence: 15,830 res

I am still confuse that how i can get the release number. I know refseq 39 was released on Jan 30, 2010 but i don't know how to confirm this. I also tried look refseq release file but was not able to get any thing.

I would really appreciate if anyone can help me out with this.

Thanks
Shalu

On Thu, Feb 4, 2010 at 6:39 PM, Smithies, Russell <Russell.Smithies at agresearch.co.nz<mailto:Russell.Smithies at agresearch.co.nz>> wrote:
If you have access to the blast database, use fastacmd -I -d databasename
Otherwise, it's usually at the bottom of your blast result.

--Russell

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org<mailto:bioperl-l-bounces at lists.open-bio.org> [mailto:bioperl-l-<mailto:bioperl-l->
> bounces at lists.open-bio.org<mailto:bounces at lists.open-bio.org>] On Behalf Of shalu sharma
> Sent: Friday, 5 February 2010 11:02 a.m.
> To: bioperl-l at lists.open-bio.org<mailto:bioperl-l at lists.open-bio.org>
> Subject: [Bioperl-l] Refseq Version
>
> Hi All,
>       This is not a bioperl query.
> Is there any way to check refseq version (release). Actually i am using
> some
> server to blast my sequences (blastall) against refseq. Is there any way i
> can get the version information on the refseq database (from the blast
> file
> or directly from the database)?
>
> Thanks
> Shalu
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org<mailto:Bioperl-l at lists.open-bio.org>
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
=======================================================================
Attention: The information contained in this message and/or attachments
from AgResearch Limited is intended only for the persons or entities
to which it is addressed and may contain confidential and/or privileged
material. Any review, retransmission, dissemination or other use of, or
taking of any action in reliance upon, this information by persons or
entities other than the intended recipients is prohibited by AgResearch
Limited. If you have received this message in error, please notify the
sender immediately.
=======================================================================





More information about the Bioperl-l mailing list