[Bioperl-l] GFF to GTF

Cook, Malcolm MEC at stowers-institute.org
Wed Nov 9 13:20:46 EST 2005


So, Projector is supposedly expecting GTF format at documented
http://www.fruitfly.org/flyannot/format.html#GTF

I know your problem is 'solved' in one form or another for getting
flybase data in/out from UCSC/Ensembl/GBrowse, but the details elude me.

I wonder if you can't simply avail yourself of UCSC's having done this
already?

For instance, the following 'GTF' is extactable from their site
(http://genome.ucsc.edu/cgi-bin/hgTables?db=dm2) at region
chr4:22000-24000 

chr4	dm2_flyBaseGene	CDS	22338	22528	0.000000	-
2	gene_id "CG32013-RA"; transcript_id "CG32013-RA"; 
chr4	dm2_flyBaseGene	exon	22335	22528	0.000000	-
.	gene_id "CG32013-RA"; transcript_id "CG32013-RA"; 
chr4	dm2_flyBaseGene	CDS	22617	23205	0.000000	-
0	gene_id "CG32013-RA"; transcript_id "CG32013-RA"; 
chr4	dm2_flyBaseGene	start_codon	23203	23205	0.000000
-	.	gene_id "CG32013-RA"; transcript_id "CG32013-RA"; 
chr4	dm2_flyBaseGene	exon	22617	23205	0.000000	-
.	gene_id "CG32013-RA"; transcript_id "CG32013-RA"; 

Though it is perhaps not quite 'right' becuase missing from it is
stop_codon and the exon_ids...

Also, there is some form of GFF converstion included in bioperl's
'process_gadfly.

good luck

--Malcolm

-----Original Message-----
From: Filipe Garrett [mailto:fgarret at ub.edu] 
Sent: Wednesday, November 09, 2005 11:05 AM
To: Cook, Malcolm; Bioperl
Subject: Re: [Bioperl-l] GFF to GTF


Cook, Malcolm wrote:

>GFF as a format has a variety of versions.  AFAIK, GTF is GFF 2.1 or
2.5
>
>The main differences have to do with format and semantics of GFF's
>column 9, the inclusion of sequence data itself in the file, and 
>
>version   reference
>1 &  2    http://www.sanger.ac.uk/Software/formats/GFF/GFF_Spec.shtml
>2.1       http://genes.cs.wustl.edu/GTF21.html
>2.5       ref?
>3         http://song.sourceforge.net/gff3.shtml
>
>What is your source of GFF? It is possible that your GFF annotation
>already is GTF.  What version is it?  Look in the file.  There may be a
>'##gff-version' directive in it.  Or, if you see that column 9 looks
>like 'gene_id "381.000"; transcript_id "381.000.1";' (c.f. gff 2.1
docs)
>then it probably already is in GTF.
>
>If it is NOT already GTF (GFF 2.1 or 2.5), then you must provide
example
>expected input and output to see if we (I) can help further. 
>
>Cheers,
>
>Malcolm Cook - mec at stowers-institute.org - 816-926-4449
>Database Applications Manager - Bioinformatics
>Stowers Institute for Medical Research - Kansas City, MO  USA 
>
>
>
>-----Original Message-----
>From: bioperl-l-bounces at portal.open-bio.org
>[mailto:bioperl-l-bounces at portal.open-bio.org] On Behalf Of Filipe
>Garrett
>Sent: Tuesday, November 08, 2005 8:18 AM
>To: Bioperl
>Subject: [Bioperl-l] GFF to GTF
>
>
>Hi all,
>
>I'm currently trying to use a software called Projector to predict 
>genes. As input it needs two sequences and the anotation of one of them

>in GTF. As the info in GTF is included in the GFF I was thinking of 
>getting a way to convert the GFF files into GTF. I thought of using the

>Bioperl module for GFF to parse the GFF and write the GTF fields to an 
>output.
>
>Does anyone knows if there's any script already made? Or of a simple
way
>
>to do it?
>
>Thanks in advance
>
>FG
>_______________________________________________
>Bioperl-l mailing list
>Bioperl-l at portal.open-bio.org
>http://portal.open-bio.org/mailman/listinfo/bioperl-l
>
>  
>
Hi all,

I'm using GFF v3 from FlyBase.
I attached an example of input and how the output should come out..

Thanks in adv.

FG





More information about the Bioperl-l mailing list