[Biopython-dev] GFF3 files in Bio.SeqIO

Peter biopython-dev at maubp.freeserve.co.uk
Tue Feb 27 01:34:04 UTC 2007


Peter wrote:
> Leighton, you also mentioned parsing the NCBI's GFF files, which seem to 
> be a tab separated variable dump of the information found in a GenBank 
> file's features table (link to documentation welcome).
 >
> An entire GFF file could be turned into a single SeqRecord with no 
> sequence, but with many sub features as SeqFeatures (akin to the results 
> of the existing "genbank" parser).  The location information would be 
> simplified for GFF.
> 
> Also, it looks like parsing just the CDS entries from a GFF file into 
> "sequence free" SeqRecords would also be sensible... (akin to the 
> existing "genbank-cds" parser).

I went through my old emails, and actually you did point me in this 
direction:

http://song.sourceforge.net/gff3.shtml
http://www.sequenceontology.org/gff3.shtml

The file format does looks much more complicated that I had first 
thought.  Interestingly the file format does allow for FASTA records to 
be appended to it - however the NCBI at least does not do this.

Perhaps a more general GFF3 parser would be more useful that a sequence 
orientated one for Bio.SeqIO?

Peter




More information about the Biopython-dev mailing list