NCBI fasta format [was: Re: [Bioperl-l] loading
matthew_pocock at yahoo.co.uk
Tue Jun 10 15:06:40 EDT 2003
Peter Wilkinson wrote:
> Yes that ncbi doc should be what its based on. And yes the lines are
> separated by an 'esc' sequence, I am not sure what we should do about
> that list .... I can not see any immediate use for keeping the list in
> the sequence. Perhaps as a first implementation we will just drop the
> list and keep the first annotation.
Possibly, emit one set of sequence parsing events for each esc seperated
ID line so that if there are 3 of these, you'd get the 3 sequences out
again? This is sort of like de-compressing the "compressed" fasta.
> Can anyone think of a pressing use for the list of definitions?
Only data completeness. Someone is going to want to look up an entry by
an ID other than the first one listed and will be confused/angry/noisy
when it doesn't get retrieved. If you just plugged this stuff directly
into the obda flat-file indexer, it should realy sort of work out OK,
don't you think? But then that would be extra work.
More information about the Bioperl-l