[EMBOSS] fuzznuc and sequence ID

Peter Rice ricepeterm at yahoo.co.uk
Mon Jun 10 11:46:36 UTC 2013


Dear Philippe,

On 10/06/2013 11:28, Philippe DESSEN wrote:
> I use fuzznuc to find some patterns in an extract of the human genome as a fasta file with several parts :
>
>> chr1:562520-566670
> GGAGTGGTAGCTCTCAGTATAGTCAGCCTCTAAGAAGAGAGCAAATGTTT

EMBOSS sees the identifier as a database named chr1 and an ID of 
562520-566670

But you can also tell EMBOSS the format is 'pearson' which is a FASTA 
format that preserves the complete ID. We added that for identifiers 
that include ':' or '|' characters.

So, if you run:

fuzznuc -sf pearson

it should preserve the full ID.

regards,

Peter Rice




More information about the EMBOSS mailing list