[Biojava-l] New Ensembl Release

Warth,Rainer,LAUSANNE,NRC-BS rainer.warth at rdls.nestle.com
Wed May 7 13:24:47 EDT 2003


Dear Thomas,
  the link you mention is very usefull. I understand that example 4 will
generate the 
mapping classification file (mapped, merged, split, lost). I assume with
these files I can
update my dataset here.
	Do you know if ENSEMBL provides these files with a release ?

Thanks, Rainer



-----Original Message-----
From: Thomas Down [mailto:thomas at derkholm.net]
Sent: mercredi, 7. mai 2003 12:03
To: Warth,Rainer,LAUSANNE,NRC-BS
Cc: biojava-l at biojava.org
Subject: Re: [Biojava-l] New Ensembl Release


Once upon a time, Warth,Rainer,LAUSANNE,NRC-BS wrote:
> Dear all,
>   with the expected update of the human genome I ask myself how I update
my
> data. A first approach will be to
> to compare the ensembl peptide/gene fasta sequence file with the new
> release. The comparison should be probably look at the sequence as well as
> the fast header. Does anybody have program usign biojava which would do
> this. Does anybody know if
> ENSEMBL will provid it ?

What do you want to learn from this comparison?

As part of the Ensembl release process, the sequences of
predicted genes are compared with previous releases.  Only
if the sequences are very similar does are the ENS* IDs
reused.  I can't remember the precise criterion for
"very similar", but there's some information about the
id-mapping process at:

    http://www.ensembl.org/java/README-id-mapping.txt

If you want more details, it might be worth contacting the
ensembl-dev mailing list.

If you want more detailed analysis, I don't know of any
off-the-shelf tools which would help you.  Depending on
precisely what you want to find, extracting sequences from
the database with biojava-ensembl then comparing them either
by running an external tool (blast?) or using the biojava
dynamic programming and alignment toolkit might be a
sensible strategy.

      Thomas.


More information about the Biojava-l mailing list