[Bioperl-l] UCSC database backend
Sean Davis
sdavis2 at mail.nih.gov
Fri Sep 1 11:53:14 UTC 2006
On Thursday 31 August 2006 19:53, Caleb Davis wrote:
> Hi folks, first time caller here. Love the show!
>
> I just started going through the archive and saw this thread. I vote in
> favor of this interface, for what it's worth. What about doing it this
> way?:
>
> $objSeqIO = Bio::SeqIO->new(-file => '~/seq/myseqCustomTrack.bed',
> -format => 'bed',
> -assembly => 'hg18',
> -track => 'hg18_myfavgenes'); #see example
Hi, Caleb. Welcome to the list.
What you are proposing seems to be two separate but related tasks. First,
parse bed-format files into bioperl-compatible sequence objects. Second,
once those are in, pull sequence if desired from UCSC.
For the first, you could certainly write a parser for bed format that would
give back sequence objects. You might also want to look at the GFF format,
as there are quite a few tools for GFF parsing, formatting, and sequence
retrieval from local databases.
For the second task, if what you are after is a straightforward way of
retrieving arbitrary sequences bases on location, then you might want to look
at the DAS service set up by ucsc. Doing what you propose would be as simple
as reading in a format your choice and then constructing a url like:
http://genome.ucsc.edu/cgi-bin/das/hg18/dna?segment=chr1:1,5000;segment=chr10:52000,53000
Which will return an xml-format file containing two sequences. As you can
see, the construction of the URL is trivial. See here for more information.
http://genome.ucsc.edu/FAQ/FAQdownloads#download23
Sean
More information about the Bioperl-l
mailing list