[EMBOSS] A question about CON entries

Rodrigo Lopez rls at ebi.ac.uk
Mon Feb 4 09:57:00 UTC 2008


Hi,

In the context of this thread I think it is worth pointing out that the 
CON entries in EMBL exist in expanded form (i.e. with the sequence) on 
the EBI ftp server in the following forms:

EMBL CONTIGS EXPANDED ftp://ftp.ebi.ac.uk/pub/databases/embl/expanded_con

EMBL ANNOTATED CON
ftp://ftp/ebi.ac.uk/pub/databases/embl/annotated_con

For comments and suggestions regarding these entries please contact:
http://www.ebi.ac.uk/embl/Contact/
http://www.ebi.ac.uk/support/ - SUe subject 'EMBL'

R:)



Guy Bottu wrote:
> Peter Rice wrote:
>> When reading a CON entry we need a database to use to read the true 
>> sequence and features.
>>
>> If we are reading from a database we can add the information in the 
>> database definition.
>>
>> How do we define a default to resolve EMBL CON entries?
>>
>> Can we handle EMBL release and EMBL updates?
> 
> There are a number of practical issues :
> - an entry with "join" information can come from a databank as well as 
> from a file.
> - EMBL and GenBank CON entries refer to segments in the same databank, 
> but RefSeq refers to GenBank.
> - a sequence presented to EMBOSS can be CON or ANN type but have already 
> a re-assembled sequence (depending on where it comes from)
> - each site has its own DB entries in emboss.default, so code that 
> explicitly says "search in embl" might not work
> 
> So, IMHO :
> - We need code for two cases : embl format (for EMBL,...) and for 
> GenBank format (for GenBank, RefSeq,...). The software must look whether 
> there are CO respectively CONTIG lines in the entry, looking for CON in 
> the ID line is not good.
> - for databank sequences :  the DB entry in emboss.default should have a 
> parameter that indicates in which databank to search for the segments. 
> If a site has RefSeq and EMBL but no GenBank, then RefSeq could still 
> use sequence information from EMBL. If there is no parameter in the DB 
> entry EMBOSS could for embl or genbank format entries search by default 
> in the same databank or simply not try the assembly (what do you think 
> is the best ?).
> - for "personal" sequences from files : is more tricky. Maybe an 
> associated or advanced parameter that says that if the input sequence is 
> of "join" type it must use a databank or file to retrieve the sequences. 
> E.g. -sjoin=xxx or -join=xxx. If xxx is a databank the seqgments can be 
> retrieved using the standard  method defined in emboss.default and if 
> xxx is a file it can be searched sequentially.
> 
> There are still some issues :
> - the program entret is for retrieving entries as they are rather then 
> for processing sequence information. Should entret also try the assembly 
> or not ?
> - feature information is another matter. Some entries have no or a very 
> poor feature information but there are entries that have features that 
> are different from the seqment entries (this is certainly so for the ANN 
> entries in EMBL and for RefSeq). How should we handle this ?
> 
> 
>     Guy Bottu,
>     BEN
> _______________________________________________
> EMBOSS mailing list
> EMBOSS at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/emboss



More information about the EMBOSS mailing list