[Bioperl-l] UCSC database backend

Wed Aug 9 14:50:53 UTC 2006

On 8/9/06 10:41 AM, "Chris Fields" <cjfields at uiuc.edu> wrote:

> Sean, 
> 
> If you have your CVS account set up you could go ahead and add it in.  I
> think the plan is to try and include this in the next dev release (1.5.5),
> which we are trying to get out by end-Sept at the latest.  I think a few RCs
> may be made beforehand, but that's really up to the pumpkin.
> 
> As RandomAccessI is already available, we could use that as a start to
> implement sequence retrieval.  Other interfaces would be added over time to
> round out getting data into the proper Bio* objects.

Chris,

Once I get CVS access, I will commit what I have done (as long as it
"works").  

Now for the details.  Keep in mind that for many of the "sequences"
available from UCSC, there is no actual "sequence" stored in the database;
rather they are stored in flat files not accessible directly via SQL.
Therefore, a sequence would be "abstract" in the sense of being a "join
location" on the chromosome, and even that isn't quite right, as the mRNA
sequence != genomic alignment sequence.  Also, there are many different
tables that maintain "sequence" information.  So, implementing RandomAccessI
is not going to be straightforward and will require some assumptions about
what will be searched.  In fact, since the same "sequence" can be in many
different tables, there may need to be a way of specifying where the search
is done (what table(s)).

Sean