[EMBOSS] Question regarding Reference Sequence Database
simon.andrews at bbsrc.ac.uk
Thu Nov 30 14:33:40 UTC 2006
On 30 Nov 2006, at 13:57, Jean Mao wrote:
> Does any program in EMBOSS package can make use of the Reference
> Databases? I indexed refseq databases with dbxflat and run showfeat
> them but receive error about has zero length sequence :
> Warning: Sequence 'refseqnt-id:NG_002612' has zero length, ignored
> Unable to read sequence 'refseqnt:NG_002612'
NG_ sequences in refseq are a bit odd. They're not real sequences
but a virtual collection of other sequences which are joined together
to make longer assemblies. The records themselves don't actually
contain any sequence (hence the zero length sequence error), just
pointers to parts of other sequences.
On the NCBI website they have a facility to join the fragments
together to create a 'real' sequence from them. You could probably
do this if you had all the underlying sequences available, but it's
not something which is likely to be possible during indexing.
EMBOSS works fine with normal refseq files, but these virtual files
are not something I'd say it was reasonable for it to cope with. It
would be nice if NCBI offered an option to download rendered versions
of these sequences, but as many of them are pretty big it might be a
very large data set.
More information about the EMBOSS