[EMBOSS] IDs in output

Peter Rice pmr at ebi.ac.uk
Fri Nov 3 11:01:42 UTC 2006


Hi Bernd,

Bernd Web wrote:
> Hi,
> 
> Sometimes I use an EMBOSS command directly on a FastA file.
> I wonder if it is possible to select the ID used in the output, esp
> for FastA records with an NCBI defline.
> 
>> gi|248166|g|AA21972.1| description...
> 
> in the output of an EMBOSS command becomes:
> AA21972.1|
> 
> It would be very easy if the ID could be chosen to be the GI number.
> Now the ID used depends on the GI record (sp, pdb, pir) show different
> IDs in EMBOSS output.

Did you mistype the defline? There is a defined set of database names that can 
appear in NCBI deflines. If the "|g|" is really "gb" then the ID will be AA21972 
which is what I would expect.

If the database name is invalid (or a new one unknown to EMBOSS) then we could 
try to use the GI number. but the "EMBOSS way" would be to use the accession 
number from the sequence version. Unfortunately at present it is using the last 
part of sequence version "1" as the ID in your example. I will fix it for the 
next release.

You can use -sid on the command line to give an ID to a sequence that does not 
have one,but not to replace an existing ID. That seems strange. It may change 
for the next release so that you can always use -sid to define the ID.

Hope that helps

Peter







More information about the EMBOSS mailing list