[Bioperl-l] bp_seqfeature_load of latest Flybase GFF annotation fails due to data inconsistency.

Cook, Malcolm MEC at stowers-institute.org
Tue Jan 9 19:38:48 UTC 2007


bash> bp_seqfeature_load.PLS --fast --dsn
'dbi:mysql:database=dmel_r5_1;host=mysql-dev' --create --noverbose <(
./flybase.net/genomes/Drosophila_melanogaster/dmel_r5.1/gff/*.gff )

(note: `flygenegff` used above sorts and filters the GFF input so that
the GFF features are loaded in order needed: gene before mRNA before

This worked fine with the last release of Flybase.  But now I get:

------------- EXCEPTION  -------------
MSG: FBtr0110936 doesn't have a primary id
STACK Bio::DB::SeqFeature::Store::GFF3Loader::build_object_tree
STACK Bio::DB::SeqFeature::Store::GFF3Loader::finish_load
STACK Bio::DB::SeqFeature::Store::GFF3Loader::load_fh
STACK Bio::DB::SeqFeature::Store::GFF3Loader::load
STACK toplevel

And indeed, sleuthing the data proves that FBtr0110936 is an example of
a Flybase transcript identifier that is annotated as being one of the
multiple parents of exons but that does not itself have an entry in


`grep FBtr0110936 dmel_r5.1/gff/*.gff` returns only exon features (no
gene, CDS, UTR, or mRNA)

... whereas, grepping for any of the other three transcripts mentioned
as parents of those exons yields the expected additional feature of type
mRNA, protein, CDS, etc

By the way, this data-bug manifests itself when searching the Flybase
website (FB2006_01, released December 8, 2006) for transcript
FBtr0110936 as:

"ERROR: report for FBtr0110936 not found"

I wonder if anyone can tell me what causes this data problem, and tell
me whether it is ubiquitous (i.e. are there other transcripts mentioned
as exon parents that do not have their own feature)?

I am trying to load this latest Flybase GFF into Lincoln Steins
Bio::DB::SeqFeature database (using bp_seqfeature_load) but the load
fails due to this data problem.   Any recommendations/workarounds to
this issue are quite welcome.

Malcolm Cook
Database Applications Manager - Bioinformatics
Stowers Institute for Medical Research - Kansas City, Missouri

More information about the Bioperl-l mailing list