[Bioperl-l] getting DNA sequence for exon features from GFF

Chris Fields cjfields at illinois.edu
Thu Aug 26 14:31:59 UTC 2010


On Aug 26, 2010, at 4:02 AM, Peter wrote:

> On Thu, Aug 26, 2010 at 9:53 AM, Dave Messina <David.Messina at sbc.su.se> wrote:
>> 
>> Admittedly i'm not up on the latest uses of GFF, but as far as I know, GFF
>> is an annotation format only — it does not contain the actual sequence.
>> 
>> Have you looked in your GFF file to see if there are nucleotides in there?
>> 
>> Dave
> 
> Actually a GFF file can optionally include a FASTA format sequence
> at the end of the file, although it seems to be more common to just
> supply separate GFF and FASTA files and cross reference by ID.
> 
> Peter

IIRC, optionally including FASTA sequence is specified only in the GFF3 spec; use of FASTA isn't explicitly mentioned in earlier versions.  We only support it with earlier GFF due to convergence of the various GFF parsers.  

The original GFF spec proposed allowing sequence, but it's in the form of meta information and I have never seen it used in practice (as you mention, the FASTA is normally loaded separately).

chris



More information about the Bioperl-l mailing list