[DAS] Finding the ADAM2 Gene via Ensembl DAS

Brian Gilman gilmanb@genome.wi.mit.edu
Sun, 19 Jan 2003 13:24:34 -0500


On 1/19/03 12:59 PM, "Thomas Down" <thomas@derkholm.net> wrote:

Hello everyone, 

    This brings up an interesting point: For organisms where we have a
fairly complete genome, is there ever a time that you don't want features
given to you in chromosome coordinates?? I can't remember the last time I
really wanted a read out of the golden path....My buddies in assembly would
kill me right now ;)

    For mouse human comparisons I don't care to have read coordinates or
even contig coordinates. It is very intuitive to ask for chromosomal (ie.
Global) coordinates for everything you're interested in....

    Do people who use the Ensembl DAS server ever go after reads or
contigs?? Or do people just use Thomas' convenience methods??

                        Best,

                            -B


> Once upon a time, on a computer far far away, Ethan Cerami wrote:
>> 
>> First some overview:  if you click on this link:
>> http://www.ensembl.org/Homo_sapiens/contigview?highlight=&chr=8&vc_start=3880
>> 0000&vc_end=39190000&x=0&y=0,
>> in the detailed panel on the bottom, you will see two
>> known genes, ADAM18 and ADAM2.
>> 
>> I am trying to get this same gene data out of Ensembl
>> via DAS.  I tried several Ensembl data sources,
>> including:  ensembl930, ens_ncbi30refseq
>> (Ensembl-mapped Human RefSeqs), ens930cds (Ensembl
>> CDS).  I finally tried ens_ncbi30trans (NCBI
>> Transcripts).  Here's the query I sent:
>> 
>> http://servlet.sanger.ac.uk:8080/das/ens_ncbi30trans/features?segment=8:38800
>> 000,39190000
>> 
>> In the response, I got back 14 features, all named
>> ADAM2, but each one is located at a different
>> location.
>> 
>> So, my questions:
>> 
>> 1.  Am I using the right Ensembl data source?
> 
> No, I don't believe that you are.  The source you're looking
> at is an NCBI genebuild, which I don't think can be expected
> to be the same as Ensembl.
> 
> The core Ensembl data (including gene predictions) is on
> /das/ensembl930/.  But trying the query you show above on
> this datasource isn't going to work... (see below).
> 
>> 2.  Why do I get back 14 ADAM2 Genes, instead of just
>> one?
> 
> One for each exon.  The DAS protocol doesn't have any way
> to return a single FEATURE element with a non-contiguous
> location, so gene structures really have to be returned as
> many individual FEATUREs grouped together.  I note that
> Ensembl actually predicts 13 exons for ADAM2.  14 is close
> enough for me -- maybe NCBI managed to map a bit more UTR
> in this case.
> 
>> 3.  Why don't I get back the ADAM18 gene?
> 
> Don't know.  I presume NCBI don't predict it (or, possibly,
> put it somewhere else).
> 
> 
> 
> The big issue here is actually that DAS servers don't *have*
> to provide you the annotation you want in chromosomal coordinates.
> It was implemented in this way so that annotation could potentially
> survive across assembly changes.  The Ensembl DAS server actually
> choses to serve gene structures in either contig coordinates
> (if the whole gene fits) or else supercontig coordinates
> (the forthcoming version actually drops the supercontigs and
> just has clone, contig, and chromosomal coordinates, so this will
> make life slightly easier).
> 
> Secondary issue: the Ensembl DAS server will call the gene
> structure ENST00000265708, rather than ADAM2.  This is because
> the DAS protocol doesn't (to the best of my knowlege) support
> synonyms.  The Ensembl server uses ENST numbers as the primary
> ID, on the basis that these are something consistent which every
> single prediction has.
> 
> If you actually want to see it directly from Ensembl, try:
> 
>  
> http://servlet.sanger.ac.uk:8080/das/ensembl930/features?segment=NT_034911;typ
> e=exon
> 
> A better bet would be to use some dedicated DAS client code, such
> as that included in the BioJava library, to access this data.
> This will handle all the sequence assembly issues for you, so you
> can do:
> 
>   SequenceDB ensemblDAS = new DASSequenceDB(
>      new URL("http://servlet.sanger.ac.uk:8080/das/ensembl930/")
>   );
>   Sequence chr = ensemblDAS.getSequence("8");
>   FeatureHolder someFeatures = chr.filter(
>       new FeatureFilter.OverlapsLocation(
>           new RangeLocation(38800000, 390000000)
>       )
>   );
> 
> And get back what you expect.
> 
>   Thomas.
> _______________________________________________
> DAS mailing list
> DAS@biodas.org
> http://biodas.org/mailman/listinfo/das
> 

-- 
Brian Gilman <gilmanb@genome.wi.mit.edu>
Group Leader Medical & Population Genetics Dept.
MIT/Whitehead Inst. Center for Genome Research
One Kendall Square, Bldg. 300 / Cambridge, MA 02139-1561 USA
phone +1 617  252 1069 / fax +1 617 252 1902