[EMBOSS] Coderet

Fri Jan 23 07:16:00 UTC 2004

Hi,

the problem is that -osformat ncbi with coderet creates the NCBI pipe
notation but it does not parse the GI number from the CDS feature.
I think it's a good idea to transfer more tags from the CDS feature into
the ID line of coderet.
I'm not sure if /gene, /protein_id and /product are mandatory for CDS.
But if they are there it would be nice to transfer them into the
description of the extracted cds and/or mRNA sequence.

David.

                      Henrikki Almusa                                                                             
                      <henrikki.almusa at h                                                                          
                      elsinki.fi>                An:      Sean.Maceachern at dpi.vic.gov.au                          
                      Gesendet von:              Kopie:   emboss at embnet.org                                       
                      owner-emboss at hgmp.         Thema:   Re: [EMBOSS] Coderet                                    
                      mrc.ac.uk                                                                                   

                      23.01.04 07:36                                                                              

On Friday 23 January 2004 07:42, Sean.Maceachern at dpi.vic.gov.au wrote:
> Hello,
>
> I am trying to use coderet to extract cds from some genbank flat files. I
> am running into a problem regarding the desriptor line in the output
fasta
> files.
>
> eg)
>
> >nm_000367_cds_1
> ATGGATGGTACAAGAACTTCACTTGACATTGAAGAGTACTCGGATACTGAGGTACAGAAA
> AACCAAGTACTAACTCTGGAAGAATGGCAAGACAAGTGGGTGAACGGCAAGACTGCTTTT
>
> I was hoping someone would be able to tell me how I can change the
> descriptor line from the generic output above (nm_000367_cds_1) to
include
> the GI : ID form the
> flat file? I also think it would be a good idea if the id could be
followed
> by a definition line to make the output more closely resemble the output
> from NCBI.

You can change the sequence format with -osformat option (in all emboss
programs which outputs sequences). Probably the right format is "ncbi". If
it
isn't read the page on emboss web site in User Documantation -> Sequence
format. That will list all available formats.

Here to help,
--
Henrikki Almusa