[Biopython-dev] GFF parser bug?

Brad Chapman chapmanb at 50mail.com
Mon Apr 26 11:56:01 UTC 2010


Eli;

> While trying to use the GFF parser I ran into a value error.
> 
> I think it's probably due to one of the GFF3 fields in my file not being
> specified as 'key=value', but just as 'value'.

Thanks for the report. Oh boy, that's a pretty bad file. In addition
to the lack of a value you brought up, there is also a Parent/Child
reference problem. The second line in the GFF you sent contains two
issues:

- A duplicate ID value for GL0000006. ID values are supposed to be
  unique in a file.
- The Parent=GL0000006 should be a reference to the initial
  gene with that ID, but is also refers to itself.

> scaffold4215_3  glimmer gene    3       62      .       -       . ID=GL0000006;Name=GL0000006;Lack 3'-end;
> scaffold4215_3  glimmer mRNA    3       62      .       -       . ID=GL0000006;Name=GL0000006;Parent=GL0000006;Lack 3'-end;
> scaffold4215_3  glimmer CDS     3       62      2.84    -       0 Parent=GL0000006;Lack 3'-end;
> scaffold4215_3  glimmer gene    124     1983    .       -       . ID=GL0000007;Name=GL0000007;Complete;

As Peter mentioned it would be useful to also file a bug with the
writers of the software that are producing this. Bringing it in line
with the spec will allow it to be more widely handled by other GFF
parsers.

You can get a fixed version of the GFF parser that gracefully
handles these issues at:

http://github.com/chapmanb/bcbb/tree/master/gff/

or apply the changes to GFFParser directly:

http://github.com/chapmanb/bcbb/commit/c530dc1b7d1d6b8b4df211849f969adf4df80a67

Thanks much for the report. Let us know if you have any other
issues,
Brad



More information about the Biopython-dev mailing list