[EMBOSS] IDs in output
Peter Rice
pmr at ebi.ac.uk
Fri Nov 3 11:01:42 UTC 2006
Hi Bernd,
Bernd Web wrote:
> Hi,
>
> Sometimes I use an EMBOSS command directly on a FastA file.
> I wonder if it is possible to select the ID used in the output, esp
> for FastA records with an NCBI defline.
>
>> gi|248166|g|AA21972.1| description...
>
> in the output of an EMBOSS command becomes:
> AA21972.1|
>
> It would be very easy if the ID could be chosen to be the GI number.
> Now the ID used depends on the GI record (sp, pdb, pir) show different
> IDs in EMBOSS output.
Did you mistype the defline? There is a defined set of database names that can
appear in NCBI deflines. If the "|g|" is really "gb" then the ID will be AA21972
which is what I would expect.
If the database name is invalid (or a new one unknown to EMBOSS) then we could
try to use the GI number. but the "EMBOSS way" would be to use the accession
number from the sequence version. Unfortunately at present it is using the last
part of sequence version "1" as the ID in your example. I will fix it for the
next release.
You can use -sid on the command line to give an ID to a sequence that does not
have one,but not to replace an existing ID. That seems strange. It may change
for the next release so that you can always use -sid to define the ID.
Hope that helps
Peter
More information about the EMBOSS
mailing list