[EMBOSS] fuzznuc and sequence ID
Peter Rice
ricepeterm at yahoo.co.uk
Mon Jun 10 11:46:36 UTC 2013
Dear Philippe,
On 10/06/2013 11:28, Philippe DESSEN wrote:
> I use fuzznuc to find some patterns in an extract of the human genome as a fasta file with several parts :
>
>> chr1:562520-566670
> GGAGTGGTAGCTCTCAGTATAGTCAGCCTCTAAGAAGAGAGCAAATGTTT
EMBOSS sees the identifier as a database named chr1 and an ID of
562520-566670
But you can also tell EMBOSS the format is 'pearson' which is a FASTA
format that preserves the complete ID. We added that for identifiers
that include ':' or '|' characters.
So, if you run:
fuzznuc -sf pearson
it should preserve the full ID.
regards,
Peter Rice
More information about the EMBOSS
mailing list