[Bioperl-l] bp_seqfeature_load of latest Flybase GFF annotationfails due to data inconsistency.
Cook, Malcolm
MEC at stowers-institute.org
Wed Jan 10 20:31:26 UTC 2007
Aloha,
For those tracking this (or otherwise lurking) Flybase have released new
versions of dmel_r5_1 GFF files that remove the data problem.
--Malcolm
> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org
> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of
> Cook, Malcolm
> Sent: Tuesday, January 09, 2007 1:39 PM
> To: bioperl list; Blanchette, Marco
> Subject: [Bioperl-l] bp_seqfeature_load of latest Flybase GFF
> annotationfails due to data inconsistency.
>
>
> Drat!
>
> bash> bp_seqfeature_load.PLS --fast --dsn
> 'dbi:mysql:database=dmel_r5_1;host=mysql-dev' --create --noverbose <(
> flygenegff
> ./flybase.net/genomes/Drosophila_melanogaster/dmel_r5.1/gff/*.gff )
>
>
> (note: `flygenegff` used above sorts and filters the GFF input so that
> the GFF features are loaded in order needed: gene before mRNA before
> exon)
>
> This worked fine with the last release of Flybase. But now I get:
>
> ------------- EXCEPTION -------------
> MSG: FBtr0110936 doesn't have a primary id
> STACK
> Bio::DB::SeqFeature::Store::GFF3Loader::build_object_tree_in_tables
> /home/mec/cvs/bioperl-live/Bio/DB/SeqFeature/Store/GFF3Loader.pm:682
> STACK Bio::DB::SeqFeature::Store::GFF3Loader::build_object_tree
> /home/mec/cvs/bioperl-live/Bio/DB/SeqFeature/Store/GFF3Loader.pm:663
> STACK Bio::DB::SeqFeature::Store::GFF3Loader::finish_load
> /home/mec/cvs/bioperl-live/Bio/DB/SeqFeature/Store/GFF3Loader.pm:372
> STACK Bio::DB::SeqFeature::Store::GFF3Loader::load_fh
> /home/mec/cvs/bioperl-live/Bio/DB/SeqFeature/Store/GFF3Loader.pm:345
> STACK Bio::DB::SeqFeature::Store::GFF3Loader::load
> /home/mec/cvs/bioperl-live/Bio/DB/SeqFeature/Store/GFF3Loader.pm:242
> STACK toplevel
> /home/mec/cvs/bioperl-live/scripts/Bio-SeqFeature-Store/bp_seq
> feature_lo
> ad.PLS:76
>
> And indeed, sleuthing the data proves that FBtr0110936 is an
> example of
> a Flybase transcript identifier that is annotated as being one of the
> multiple parents of exons but that does not itself have an entry in
> Flybase!
>
> Proof:
>
> `grep FBtr0110936 dmel_r5.1/gff/*.gff` returns only exon features (no
> gene, CDS, UTR, or mRNA)
>
> ... whereas, grepping for any of the other three transcripts mentioned
> as parents of those exons yields the expected additional
> feature of type
> mRNA, protein, CDS, etc
>
> By the way, this data-bug manifests itself when searching the Flybase
> website (FB2006_01, released December 8, 2006) for transcript
> FBtr0110936 as:
>
> "ERROR: report for FBtr0110936 not found"
>
> I wonder if anyone can tell me what causes this data problem, and tell
> me whether it is ubiquitous (i.e. are there other transcripts
> mentioned
> as exon parents that do not have their own feature)?
>
> I am trying to load this latest Flybase GFF into Lincoln Steins
> Bio::DB::SeqFeature database (using bp_seqfeature_load) but the load
> fails due to this data problem. Any recommendations/workarounds to
> this issue are quite welcome.
>
>
> Malcolm Cook
> Database Applications Manager - Bioinformatics
> Stowers Institute for Medical Research - Kansas City, Missouri
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
More information about the Bioperl-l
mailing list