[EMBOSS] A question about CON entries
Rodrigo Lopez
rls at ebi.ac.uk
Mon Feb 4 09:57:00 UTC 2008
Hi,
In the context of this thread I think it is worth pointing out that the
CON entries in EMBL exist in expanded form (i.e. with the sequence) on
the EBI ftp server in the following forms:
EMBL CONTIGS EXPANDED ftp://ftp.ebi.ac.uk/pub/databases/embl/expanded_con
EMBL ANNOTATED CON
ftp://ftp/ebi.ac.uk/pub/databases/embl/annotated_con
For comments and suggestions regarding these entries please contact:
http://www.ebi.ac.uk/embl/Contact/
http://www.ebi.ac.uk/support/ - SUe subject 'EMBL'
R:)
Guy Bottu wrote:
> Peter Rice wrote:
>> When reading a CON entry we need a database to use to read the true
>> sequence and features.
>>
>> If we are reading from a database we can add the information in the
>> database definition.
>>
>> How do we define a default to resolve EMBL CON entries?
>>
>> Can we handle EMBL release and EMBL updates?
>
> There are a number of practical issues :
> - an entry with "join" information can come from a databank as well as
> from a file.
> - EMBL and GenBank CON entries refer to segments in the same databank,
> but RefSeq refers to GenBank.
> - a sequence presented to EMBOSS can be CON or ANN type but have already
> a re-assembled sequence (depending on where it comes from)
> - each site has its own DB entries in emboss.default, so code that
> explicitly says "search in embl" might not work
>
> So, IMHO :
> - We need code for two cases : embl format (for EMBL,...) and for
> GenBank format (for GenBank, RefSeq,...). The software must look whether
> there are CO respectively CONTIG lines in the entry, looking for CON in
> the ID line is not good.
> - for databank sequences : the DB entry in emboss.default should have a
> parameter that indicates in which databank to search for the segments.
> If a site has RefSeq and EMBL but no GenBank, then RefSeq could still
> use sequence information from EMBL. If there is no parameter in the DB
> entry EMBOSS could for embl or genbank format entries search by default
> in the same databank or simply not try the assembly (what do you think
> is the best ?).
> - for "personal" sequences from files : is more tricky. Maybe an
> associated or advanced parameter that says that if the input sequence is
> of "join" type it must use a databank or file to retrieve the sequences.
> E.g. -sjoin=xxx or -join=xxx. If xxx is a databank the seqgments can be
> retrieved using the standard method defined in emboss.default and if
> xxx is a file it can be searched sequentially.
>
> There are still some issues :
> - the program entret is for retrieving entries as they are rather then
> for processing sequence information. Should entret also try the assembly
> or not ?
> - feature information is another matter. Some entries have no or a very
> poor feature information but there are entries that have features that
> are different from the seqment entries (this is certainly so for the ANN
> entries in EMBL and for RefSeq). How should we handle this ?
>
>
> Guy Bottu,
> BEN
> _______________________________________________
> EMBOSS mailing list
> EMBOSS at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/emboss
More information about the EMBOSS
mailing list