[BioRuby] SRA downloader

Chris Fields cjfields at illinois.edu
Tue Feb 15 14:57:17 UTC 2011


On Feb 15, 2011, at 7:45 AM, Raoul Bonnal wrote:

> Hi Peter,
> On 15/feb/2011, at 14.18, Peter Cock wrote:
> 
>> http://fungalgenomes.org/blog/2011/02/a-hyphaltip-get-a-bunch-of-sra-data/
> thanks for the link.
> 
> Actually, DDBJ and EBI (ENA) have a repository for SRA datasets.
> The question is? Do we need to support SRA or it's just a convenient way for providers to store data?

Yes, I think there needs to be support for easier data retrieval, but of course only if there will be a SRA database to being with.  All the feedback (both positive and negative) from Eisen's blog indicates the need for such a resource, albeit with an improved UI.

Unfortunately, the most significant problem we're seeing on the NCBI end is (I believe) a general lack of funding for on-going and new projects; this has been going on for a while now. As Peter pointed out removal of SRA and other resources hasn't been made official (the link quotes an anonymous source).  However, past cuts at NCBI and the newly released federal budget doesn't make me think there will be a stop gap to prevent SRA and other NCBI resources from going away.

> EBI seems to me more computer friendly than NIH site.

I generally find the same.

> If more providers will switch to Aspera, there is a developer kit SDK http://www.asperasoft.com/en/products/fasp_SDK_9/fasp_SDK_9 that could be interesting to support.
> 
> Today I tested the download speed and it performs very well compared to normal ftp.
> 
> --
> Ra

A bit confused on this last part. If we're talking about a simple ftp-like resource to pull data from then I would suggest biotorrents, which is specifically developed for handling large datasets.  

However, a gzipped blob with an accession is useless without a way to add additional meta-information about it (species, source, protocols, etc etc) and a way to search through such information.  Most users will want a searchable and possibly curated repository to pull data from, or at least one that gives them accessions and links to such.  There is a need for something like SRA whether NCBI or others decide to handle it.

chris



More information about the BioRuby mailing list