Problem when building database index

Peter Rice pmr at ebi.ac.uk
Fri Feb 21 09:24:17 UTC 2003


Frankie Cheung wrote:

> Can anyone help me? I don't know how to build database index in EMBOSS
> for the following databanks as I can't find any related information
> from the administration manual:

Some of these are not sequence databases - but I do plan to add nmore 
database types in the near future (I already started to extend the 
emboss.default syntax for them)

> - PDB

The domainatrix EMBASSY package uses cleaned PDB files. Do you want the 
sequences or the structures?

 > - PFAM
 > - PRODOM

Alignment databases are high on my list of things to do. One (small) 
problem is how to name the individual sequences in an alignment, for 
example in a PFAM entry.

> - OGLYC
> - BLOCKS
> - TAXONOMY
> - ENZYME

How would you use these in EMBOSS? Or do you just want to use them with 
entret (entret is only for sequences, but we can make a general version)

> - dbEST
> - dbSTS
> - dbGSS

These are available as FASTA format files (so you can use dbifasta) - or 
you can index the huge flatfile versions with SRS and use the SRSFASTA 
access method (which asks SRS to write the sequence in FASTA format, and 
then reads it into EMBOSS)

> - dbSNP (XML format now: would EMBOSS consider to allow build XML db
> index in next version ?)

Yes, we will consider XML format databases - but how would you use dbSNP 
entries in EMBOSS?

> - UNIGENE

The UNIGENE clusters are available as "almost" fasta files (they have 
headers for each cluster). You can index in SRS and use the SRSFASTA 
access method. I am looking at skipping the headers and allowing 
dbifasta to index these files directly - but there is a choice of 
clusters (see pfam above) or single sequences as "entries". In SRS the 
UNIGENE data can be nidexed in both ways.

> - LocusLink

How would you use this in EMBOSS?

> - OMIM


> - InterPro

Ah, XML again! In progress.

Does anyone else have requests for databases under EMBOSS to help set 
priorities for this work?


Hope this helps,

Peter




More information about the EMBOSS mailing list