[EMBOSS] IDs in output

Bernd Web bernd.web at gmail.com
Fri Nov 3 12:29:35 UTC 2006


Hi Peter,

Although I copy pasted, indeed the defline was wrong. It should have been:

>gi|248166|gb|AAB21972.1| invertase {EC 3.2.1.26} [baker's yeast,
Peptide Partial, 6 aa, segment 10 of 12]
ATNTTL

EMBOSS extracts "AAB21972.1".
Having the version number is OK since otherwise the sequence is not
completely defined (AAB21972 could refer to multiple versions).

My idea was more related to selecting the GI number as ID to use in
EMBOSS applications. Now the accession number depends on the format of
the defline:
sp ->  Entry Name (not primary accession)
ref, emb, gb -> Accesion
pdb -> PDB protein name with Chain concatenated to it.

Although I wrote a script to map the names from NCBI deflines to
EMBOSS names, it could be easy to have the option to use the GI
number.

Regards,
Bernd





On 11/3/06, Peter Rice <pmr at ebi.ac.uk> wrote:
> Hi Bernd,
>
> Bernd Web wrote:
> > Hi,
> >
> > Sometimes I use an EMBOSS command directly on a FastA file.
> > I wonder if it is possible to select the ID used in the output, esp
> > for FastA records with an NCBI defline.
> >
> >> gi|248166|g|AA21972.1| description...
> >
> > in the output of an EMBOSS command becomes:
> > AA21972.1|
> >
> > It would be very easy if the ID could be chosen to be the GI number.
> > Now the ID used depends on the GI record (sp, pdb, pir) show different
> > IDs in EMBOSS output.
>
> Did you mistype the defline? There is a defined set of database names that can
> appear in NCBI deflines. If the "|g|" is really "gb" then the ID will be AA21972
> which is what I would expect.
>
> If the database name is invalid (or a new one unknown to EMBOSS) then we could
> try to use the GI number. but the "EMBOSS way" would be to use the accession
> number from the sequence version. Unfortunately at present it is using the last
> part of sequence version "1" as the ID in your example. I will fix it for the
> next release.
>
> You can use -sid on the command line to give an ID to a sequence that does not
> have one,but not to replace an existing ID. That seems strange. It may change
> for the next release so that you can always use -sid to define the ID.
>
> Hope that helps
>
> Peter
>
>
>
>
>



More information about the EMBOSS mailing list