[Bioperl-l] UCSC database backend
Chris Fields
cjfields at uiuc.edu
Wed Aug 9 19:21:57 UTC 2006
...
> Before we get too far down this line of thought, keep in mind that this
> will
> be dozens of Gb of sequence and database tables. See here for details:
>
> http://genome.ucsc.edu/admin/mirror.html
>
> The sequences include all of genbank, essentially. The mysql tables ALONE
> (no sequence) for only ONE human assembly is on the order of 10Gb--not the
> kind of thing you can download in a few minutes (or even hours). Just to
> keep in mind....
Yes, there was a recent bug related to the packing order for very large
files (>4 GB, I believe). I'm hoping Lincoln takes a look at it soon for
further suggestions as the proposed changes would require reindexing
everything. However, the proposed fix did work well for the submitter.
> On another point, the strength of UCSC is not in obtaining sequence, but
> in
> mapping to the genome. I think getting actual sequence should be
> secondary
> here, if for no other reason than there are trivially easy ways of getting
> sequence information from elsewhere given an accession or ID. There is
> simply too much information to be stored locally for most people and
> getting
> the data remotely from UCSC doesn't seem possible currently.
>
> Sean
Then we could use this to primarily return location and other information
instead. Anyone interested in sequence can use the location info to
retrieve sequences remotely (via Bio::DB::GenBank or similar) or locally
(Bio::DB::Fasta).
The key is to get this set up in some basic way that people could start
using it, make suggestions, etc. Sendu, any suggestions?
Chris
More information about the Bioperl-l
mailing list