[Biopython-dev] GFF file parsing and error handling

carl crott carlcrott at gmail.com
Mon Jan 9 14:36:00 UTC 2012


Hey all,

I'm posting here because I know there has been talk about GFF file parsing
and I'd love to code a bit as soon as I comprehend whats going on with
these files.


I've got this GFF file ( placed in a spreadsheet for readability )

https://docs.google.com/spreadsheet/ccc?key=0AtOqyz8P_fJ0dGVOMzNSM29qUVdjZmZ4emdIQ3U2OUE&hl=en_US#gid=0

line 178 + 179 are the problematic lines

what is going on here?

I know that these genes are listed in reverse order and that a sequence of:
    stop_codon
    CDS
    CDS
    start_codon

the above is a normal gene arrangement.

BUT my guesses as to what happening ( between 178 and 179 ):
1) the gene stretches from the end of one chromosome to another?
2) simply a stop_codon with no attached CDS or start_codon ?

I've successfully managed to parse out the gene intervals and now I'm
working on the error handling.

Thanks,
Carl

-- 
Carl Crott
Web Applications Engineer
www.black-glass.com
412-610-0600



More information about the Biopython-dev mailing list