[EMBOSS] Coderet
David.Bauer at SCHERING.DE
David.Bauer at SCHERING.DE
Fri Jan 23 07:16:00 UTC 2004
Hi,
the problem is that -osformat ncbi with coderet creates the NCBI pipe
notation but it does not parse the GI number from the CDS feature.
I think it's a good idea to transfer more tags from the CDS feature into
the ID line of coderet.
I'm not sure if /gene, /protein_id and /product are mandatory for CDS.
But if they are there it would be nice to transfer them into the
description of the extracted cds and/or mRNA sequence.
David.
Henrikki Almusa
<henrikki.almusa at h
elsinki.fi> An: Sean.Maceachern at dpi.vic.gov.au
Gesendet von: Kopie: emboss at embnet.org
owner-emboss at hgmp. Thema: Re: [EMBOSS] Coderet
mrc.ac.uk
23.01.04 07:36
On Friday 23 January 2004 07:42, Sean.Maceachern at dpi.vic.gov.au wrote:
> Hello,
>
> I am trying to use coderet to extract cds from some genbank flat files. I
> am running into a problem regarding the desriptor line in the output
fasta
> files.
>
> eg)
>
> >nm_000367_cds_1
> ATGGATGGTACAAGAACTTCACTTGACATTGAAGAGTACTCGGATACTGAGGTACAGAAA
> AACCAAGTACTAACTCTGGAAGAATGGCAAGACAAGTGGGTGAACGGCAAGACTGCTTTT
>
> I was hoping someone would be able to tell me how I can change the
> descriptor line from the generic output above (nm_000367_cds_1) to
include
> the GI : ID form the
> flat file? I also think it would be a good idea if the id could be
followed
> by a definition line to make the output more closely resemble the output
> from NCBI.
You can change the sequence format with -osformat option (in all emboss
programs which outputs sequences). Probably the right format is "ncbi". If
it
isn't read the page on emboss web site in User Documantation -> Sequence
format. That will list all available formats.
Here to help,
--
Henrikki Almusa
More information about the EMBOSS
mailing list