[Bioperl-l] Problem with parsing ENSEMBL genbank flat file with genbank2gff3. pls

Ewan Birney birney at ebi.ac.uk
Tue Jan 18 04:05:49 EST 2005


On Mon, 17 Jan 2005, Chris Mungall wrote:

> 
> Hi Vladimir
> 
> The genbank2gff3 script, in scripts/Bio-DB-GFF is attempting to recover
> information often which the genbank flat file format loses; this is the
> information about which mRNA relates to which CDS. You may or may not need
> this information, it depends why you are doing the conversion. If you
> don't need this, you may want just a straightforward genbank->gff
> conversion. Let me know if this is what you want to do and I can help with
> that.
> 
> If you _do_ wish to preserve the mRNA to CDS mappings, be aware that it
> isn't always possible to recover these with 100% fidelity from the genbank
> flat files. You may wish to pursue alternate approaches, such as
> downloading ensembl as a mysql dump (any ensembl folks around.. any plans
> to offer downloads in alternate formats such as gff3? This would be
> fantastic)

This is on the road map for Ensembl due to Vectorbase, but don't forget we 
offer GTF format, which is a different and well established GFF derived 
format and very clean to parse.

Go to Ensembl website --> Click on EnsMart, select your genome, in Filter,
unselect the filter by genomic region (to get the entire region) then in
Output select structure and select "GTF" format.

> 
> If you'd prefer to carry on via the genbank flat file route, here's what
> you should do:
> 
> * get the latest version of genbank2gff3.PLS I have just checked into cvs
> (I can send you a copy if you are using a bioperl release and not cvs)
> 
> * run the script with the "--ethresh 3" option. This will raise the error
> severity threshold at which problems with genbank file become
> showstoppers.
> 
> In addition, I will take a look at this particular file and see what it is
> that is causing problems and get back to you.
> 
> Cheers
> Chris
> 
> On Mon, 17 Jan 2005, Babenko, Vladimir (NIH/NLM/NCBI) wrote:
> 
> >     Greetings,
> > While parsing a genbank file taken from:
> > ftp://ftp.ensembl.org/pub/current_human/data/flatfiles/genbank/Homo_sapiens.
> > 0.dat as of Jan 2005,
> > I'm getting the following unflattening error:
> > --------------------------------------------------------
> > Processing file /ENSEMBL/Homo_sapiens.0.dat...
> > working on contig
> > chromosome:NCBI35:1:1:994676:1...chromosome:NCBI35:1:1:994676:1 Unflattening
> > error:
> > Details:
> > ------------- EXCEPTION  -------------
> > MSG: PROBLEM, SEVERITY==2
> > no containers possible for SeqFeature of type: CDS; this SF is being placed
> > at root level
> > SF [Bio::SeqFeature::Generic=HASH(0x86485d8)]: CDS; ENSG00000146556
> >
> > STACK Bio::SeqFeature::Tools::Unflattener::problem
> > /Bio/SeqFeature/Tools/Unflattener.pm:940
> > STACK Bio::SeqFeature::Tools::Unflattener::unflatten_group
> > /Bio/SeqFeature/Tools/Unflattener.pm:1983
> > STACK Bio::SeqFeature::Tools::Unflattener::unflatten_groups
> > /Bio/SeqFeature/Tools/Unflattener.pm:1744
> > STACK Bio::SeqFeature::Tools::Unflattener::unflatten_seq
> > /Bio/SeqFeature/Tools/Unflattener.pm:1449
> > STACK (eval) genbank2gff3.PLS:345
> > STACK main::unflatten_seq genbank2gff3.PLS:344
> > STACK toplevel genbank2gff3.PLS:209
> >
> > --------------------------------------
> >
> > Possible gene unflattening error withchromosome:NCBI35:1:1:994676:1: consult
> > STDERR
> >
> > Using bioperl-1.5.0.RC2 under Linux.
> >
> >     Would be grateful for the hint,
> >       Vladimir
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at portal.open-bio.org
> > http://portal.open-bio.org/mailman/listinfo/bioperl-l
> >
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
> 

-----------------------------------------------------------------
Ewan Birney.  Work:  +44 1223 494420
             Email:  birney "at" ebi.ac.uk 
Clerical Assistant:  shelley "at" ebi.ac.uk
Please cc shelley for urgent or diary-dependent requests
-----------------------------------------------------------------



More information about the Bioperl-l mailing list