[EMBOSS] Question regarding Reference Sequence Database

Simon Andrews simon.andrews at bbsrc.ac.uk
Thu Nov 30 14:33:40 UTC 2006


On 30 Nov 2006, at 13:57, Jean Mao wrote:

> Hi,
>
> Does any program in EMBOSS package can make use of the Reference  
> Sequence
> Databases? I indexed refseq databases with dbxflat and run showfeat  
> against
> them but receive error about has zero length sequence :
>
> Warning: Sequence 'refseqnt-id:NG_002612' has zero length, ignored  
> Error:
> Unable to read sequence 'refseqnt:NG_002612'


NG_ sequences in refseq are a bit odd.  They're not real sequences  
but a virtual collection of other sequences which are joined together  
to make longer assemblies.  The records themselves don't actually  
contain any sequence (hence the zero length sequence error), just  
pointers to parts of other sequences.

On the NCBI website they have a facility to join the fragments  
together to create a 'real' sequence from them.  You could probably  
do this if you had all the underlying sequences available, but it's  
not something which is likely to be possible during indexing.

EMBOSS works fine with normal refseq files, but these virtual files  
are not something I'd say it was reasonable for it to cope with.  It  
would be nice if NCBI offered an option to download rendered versions  
of these sequences, but as many of them are pretty big it might be a  
very large data set.

TTFN

Simon.






More information about the EMBOSS mailing list