[EMBOSS] Coderet
Sean.Maceachern at dpi.vic.gov.au
Sean.Maceachern at dpi.vic.gov.au
Fri Jan 23 05:42:12 UTC 2004
Hello,
I am trying to use coderet to extract cds from some genbank flat files. I
am running into a problem regarding the desriptor line in the output fasta
files.
eg)
>nm_000367_cds_1
ATGGATGGTACAAGAACTTCACTTGACATTGAAGAGTACTCGGATACTGAGGTACAGAAA
AACCAAGTACTAACTCTGGAAGAATGGCAAGACAAGTGGGTGAACGGCAAGACTGCTTTT
I was hoping someone would be able to tell me how I can change the descriptor line from the generic output above (nm_000367_cds_1) to include the GI :
ID form the
flat file? I also think it would be a good idea if the id could be followed by a definition line to make the output more closely resemble the output
from NCBI.
eg)
>gi|4507652 Homo sapiens thiopurine S-methyltransferase (TPMT), mRNA
ATGGATGGTACAAGAACTTCACTTGACATTGAAGAGTACTCGGATACTGAGGTACAGAAA
AACCAAGTACTAACTCTGGAAGAATGGCAAGACAAGTGGGTGAACGGCAAGACTGCTTTT
Is there anyway to currently do this using the existing options? I am mostly interested in changing the output from the (nm _ ID) to the (gi | ID) but
I think the deffinition
line would also be useful. I'm assuming that it shouldn't be too hard as all of this information exists in the flat file, which looks fairly easy to
parse.
If there is no way to currently do this I would appreciate it if someone could suggest where and if I could modify the existing script to complete the
above.
Thanks,
Sean MacEachern
More information about the EMBOSS
mailing list