[EMBOSS] Coderet

Sean.Maceachern at dpi.vic.gov.au Sean.Maceachern at dpi.vic.gov.au
Fri Jan 23 05:42:12 UTC 2004


Hello,

I am trying to use coderet to extract cds from some genbank flat files. I
am running into a problem regarding the desriptor line in the output fasta
files.

eg)

>nm_000367_cds_1
ATGGATGGTACAAGAACTTCACTTGACATTGAAGAGTACTCGGATACTGAGGTACAGAAA
AACCAAGTACTAACTCTGGAAGAATGGCAAGACAAGTGGGTGAACGGCAAGACTGCTTTT

I was hoping someone would be able to tell me how I can change the descriptor line from the generic output above (nm_000367_cds_1) to include the GI :
 ID form the
flat file? I also think it would be a good idea if the id could be followed by a definition line to make the output more closely resemble the output
from NCBI.

eg)

>gi|4507652 Homo sapiens thiopurine S-methyltransferase (TPMT), mRNA
ATGGATGGTACAAGAACTTCACTTGACATTGAAGAGTACTCGGATACTGAGGTACAGAAA
AACCAAGTACTAACTCTGGAAGAATGGCAAGACAAGTGGGTGAACGGCAAGACTGCTTTT


Is there anyway to currently do this using the existing options? I am mostly interested in changing the output from the (nm _ ID) to the (gi | ID) but
 I think the deffinition
line would also be useful. I'm assuming that it shouldn't be too hard as all of this information exists in the flat file, which looks fairly easy to
parse.

If there is no way to currently do this I would appreciate it if someone could suggest where and if I could modify the existing script to complete the
 above.

Thanks,

Sean MacEachern





More information about the EMBOSS mailing list