[EMBOSS] inosine in nucleotide sequence databases
pmr at ebi.ac.uk
Tue Aug 18 10:05:05 UTC 2009
Wells, Isabelle wrote:
> Can emboss handle inosine in nucleotide sequences? We have a
> nucleotide file in embl format where some sequences contain inosine.
> Dbiflat doesn't seem to index the database properly although no error
> message was given and those inosine containing sequences cannot be
> retrieved with seqret. Any suggestions on what we could do apart from
> replacing inosine by X or N?
I assume your dbiflat problem is an error in retrieving the entries,
unless there is some other format problem in the database that prevents
entries from being recognized by the dbiflat parser. If you can send me
one of the Inosine-containing entries (or a fake entry if these one are
proprietary information) I can check.
We treat Inosine as a modified base. These are usually in RNA sequences.
You should replace it by X or N and if you have an EMBL format feature
table you could add a modified_base feature with a /mod_base=I qualifier
to mark each Inosine. EMBOSS does nothing special with these in the
current release, but you can perhaps suggest applications to use the
modified base information.
Hope this helps,
More information about the EMBOSS