[Bioperl-l] SGD GFF3 file available soon
Stan Dong
qdong at genome.stanford.edu
Thu Feb 19 01:42:32 EST 2004
Hi Scott,
In my examples, I use arabic number in the seqid column to indicate
chromosome number. So I should put 'ID=1' in the attribute column of the
first line which represents the whole chromosome. Since these IDs need to
be unique within the scope of the GFF file, I think it's better to use a
more descriptive name like 'chr01' in this case (and 'ID=chr01' in the
attribute column).
Thanks a lot for your suggestion,
-Stan
On Wed, 18 Feb 2004, Scott Cain wrote:
> Stan,
>
> In your sample GFF, the seqid in the first column has to correspond to
> some ID, usually also defined in the same GFF file. For instance, if
> the features in the GFF file are all on chromosome I, the first column
> of all of those lines would have the same ID as the ID declared for
> chromosome I. For example:
>
> I SGD chromosome 1 230211 . . . ID=I;description=Sequence "I"
> I SGD telomere 1 801 . - 0 ID=TEL01L;description=I left telomeric region;db_xref=SGD:S0028862
> I SGD repeat_family 1 62 . - 0 ID=TEL01L-TR;name=Telomeric Repeat;description=I left telomere TG(1-3);db_xref=SGD:S0028864
> ...etc...
>
> Sorry I didn't point that out before--when I looked at the Excel sheet
> you sent me before, I didn't see all of it (I am too used to working
> with plain text files).
>
> Scott
>
> -------------Original Message---------------
> > Date: Wed, 18 Feb 2004 14:09:27 -0800
> > From: Stan Dong <qdong at genome.stanford.edu>
> > Subject: [Bioperl-l] SGD GFF3 file available soon
> > To: bioperl-l at bioperl.org
> > Message-ID: <1DE37948-625F-11D8-89C8-000A956A0A36 at genome.stanford.edu>
> > Content-Type: text/plain; charset=US-ASCII; format=flowed
> >
> > Hi,
> >
> > I am a programmer at Saccharomyces Genome Database ( SGD,
> > http://www.yeastgenome.org/ ). I am working on developing a flat file
> > in GFF3 format ( http://song.sourceforge.net/gff3-jan04.shtml ) to
> > represent sequence features of yeast genome and it will soon be
> > released on our ftp site. This is very useful because quite a few open
> > source softwares can take this file format as input such as Gbrowse,
> > Chado etc.
> >
> > I would like comments from people who are interested in doing similar
> > things and those who have good/not-so-good experience on GFF3 to share
> > with. For me, it took a while to get the specification done especially
> > make the third column (type) fully compatible with Sequence Ontology
> > (SO). One thing I liked about GFF3 is the last column (attributes)
> > where you can put all kinds of useful information such as in our case
> > GO annotation and a nice description of a feature. An example file of
> > SGD GFF3 can be viewed here.
> >
> > ftp://genome-ftp.stanford.edu/pub/people/curator/GFF3Example.txt
> >
> > Thanks,
> >
> > Stan Dong
> > Programmer, SGD
>
> --
> ------------------------------------------------------------------------
> Scott Cain, Ph. D. cain at cshl.org
> GMOD Coordinator (http://www.gmod.org/) 216-392-3087
> Cold Spring Harbor Laboratory
>
More information about the Bioperl-l
mailing list