[Biojava-l] Ensembl gene parsing

Stein Aerts stein.aerts at esat.kuleuven.ac.be
Wed Jan 29 15:56:03 EST 2003


I'm not at all dependent on EMBL files. I only need a sequence with some 
annotation and some features. Any format that is parsable with BioJava 
would do. What alternatives are currently at hand?

Stein.




Arne Stabenau wrote:

>
>
> Stein Aerts wrote:
>
>>
>> OK, then we will wait until monday.
>> I am indeed considering to use ensj. Would it be possible to inform 
>> me on how to construct a EMBL formatted flat file of a gene (with 
>> some features of choice) using ensj? I couldn't find that in the 
>> documentation.
>
>
> Ensj does not support EMBL flat file dumps and I seriously consider 
> not to do it in the future for EnsEMBL. Its not well documented format 
> and very complicated occasionally. Is there a reason why you depend so 
> much on EMBL files?
> I would rather provide alternatives.
>
> Arne
>
>
>
>>
>> Regards and thanks a lot,
>> Stein.
>>
>> Arne Stabenau wrote:
>>
>>> Hi Stein,
>>>
>>> The EMBL export function on the current website used to work when we 
>>> released the site. For some reason the mistakes you spotted got 
>>> introduced. We tested the new release website which will come out 
>>> next monday and it doesnt seem to have the problem (yet). So I would 
>>> like to take the easy route for us and wait for the next release. We 
>>> will however be careful not to reinvent the bug on that one.
>>>
>>> If there is any pressing reason for a fix earlier than that, please 
>>> let us know. Please consider to use ensj for what you want to do, 
>>> its as fast as the perl code for most of the stuff it does. It just 
>>> doesnt give you biojava objects.
>>>
>>> Arne
>>>
>>>
>>> Stein Aerts wrote:
>>>
>>>>
>>>> The BioJava-Ensembl should be ideal. However, retrieving a gene 
>>>> with flanking sequence based on gene_stable_id using the code below 
>>>> takes a million years.
>>>>
>>>>         Ensembl ens = new Ensembl(
>>>>               
>>>> org.ensembl.db.sql.SQLDatabaseAdaptor.connectSQL(dbURL, dbUser, 
>>>> dbPass, dbSchemaVersion)
>>>>         );
>>>>         SequenceDB chromos = ens.getChromosomes();
>>>>         FeatureHolder transHolder = chromos.filter(
>>>>               new FeatureFilter.ByAnnotation("ensembl.gene", 
>>>> "ENSG00000167779")
>>>>         );
>>>>
>>>> The output gives:
>>>>
>>>> Querying:  where contig_id = '592075'
>>>> Querying:  where contig_id = '162233'
>>>> Querying:  where contig_id = '162238'
>>>> Querying:  where contig_id = '162241'
>>>> etc.
>>>>
>>>> So that is not very efficient.
>>>>
>>>> Would there an alternative here that is similar to the export data 
>>>> function (based on any feature: gene, contig, clone, cDNA, 
>>>> peptide...) which runs via HTTP and is very very fast.
>>>
>>>
>>>
>>>
>>> If you want to see fast, construct URLs for the Mart and extract the 
>>> data you want from the result ...
>>>
>>>>
>>>>     Stein.
>>>>
>>>>
>>>> Thomas Down wrote:
>>>>
>>>>> On Wed, Jan 29, 2003 at 09:58:18AM +0000, Ewan Birney wrote:
>>>>>  
>>>>>
>>>>>> (c) If you don't like Perl ( ... this is the biojava mailing 
>>>>>> list...) then there is a pretty complete and stable Java binding 
>>>>>> to Ensembl - it doesn't use BioJava - it is more just a vanilla 
>>>>>> Java binding to Ensembl. Craig melsopp is the lead on that. The 
>>>>>> web page is
>>>>>>
>>>>>> http://www.ensembl.org/java/
>>>>>>   
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> (d) There's also a completely different BioJava-based mechanism
>>>>> for accessing Ensembl databases:
>>>>>
>>>>>   http://biojava.org/pipermail/biojava-l/2002-December/003418.html
>>>>>
>>>>> Unlike ensj, this is 100% read-only.  It does give you access
>>>>> without an additional API, though, and as far as I know it's the
>>>>> only thing which supports multiple versions of the Ensembl database
>>>>> schema off a single codebase.
>>>>>
>>>>>     Thomas.
>>>>>
>>>>>  
>>>>>
>>>>
>>>> -- 
>>>> Stein Aerts BioI at SISTA
>>>> K.U.Leuven ESAT-SCD Belgium
>>>> http://www.esat.kuleuven.ac.be/~dna/BioI
>>>>
>>>>
>>>
>>
>

-- 
Stein Aerts BioI at SISTA
K.U.Leuven ESAT-SCD Belgium
http://www.esat.kuleuven.ac.be/~dna/BioI





More information about the Biojava-l mailing list