[Bioperl-l] Problem with parsing ENSEMBL genbank flat file with genbank2gff3. pls

Chris Mungall cjm at fruitfly.org
Mon Jan 17 14:51:37 EST 2005


Hi Vladimir

The genbank2gff3 script, in scripts/Bio-DB-GFF is attempting to recover
information often which the genbank flat file format loses; this is the
information about which mRNA relates to which CDS. You may or may not need
this information, it depends why you are doing the conversion. If you
don't need this, you may want just a straightforward genbank->gff
conversion. Let me know if this is what you want to do and I can help with
that.

If you _do_ wish to preserve the mRNA to CDS mappings, be aware that it
isn't always possible to recover these with 100% fidelity from the genbank
flat files. You may wish to pursue alternate approaches, such as
downloading ensembl as a mysql dump (any ensembl folks around.. any plans
to offer downloads in alternate formats such as gff3? This would be
fantastic)

If you'd prefer to carry on via the genbank flat file route, here's what
you should do:

* get the latest version of genbank2gff3.PLS I have just checked into cvs
(I can send you a copy if you are using a bioperl release and not cvs)

* run the script with the "--ethresh 3" option. This will raise the error
severity threshold at which problems with genbank file become
showstoppers.

In addition, I will take a look at this particular file and see what it is
that is causing problems and get back to you.

Cheers
Chris

On Mon, 17 Jan 2005, Babenko, Vladimir (NIH/NLM/NCBI) wrote:

>     Greetings,
> While parsing a genbank file taken from:
> ftp://ftp.ensembl.org/pub/current_human/data/flatfiles/genbank/Homo_sapiens.
> 0.dat as of Jan 2005,
> I'm getting the following unflattening error:
> --------------------------------------------------------
> Processing file /ENSEMBL/Homo_sapiens.0.dat...
> working on contig
> chromosome:NCBI35:1:1:994676:1...chromosome:NCBI35:1:1:994676:1 Unflattening
> error:
> Details:
> ------------- EXCEPTION  -------------
> MSG: PROBLEM, SEVERITY==2
> no containers possible for SeqFeature of type: CDS; this SF is being placed
> at root level
> SF [Bio::SeqFeature::Generic=HASH(0x86485d8)]: CDS; ENSG00000146556
>
> STACK Bio::SeqFeature::Tools::Unflattener::problem
> /Bio/SeqFeature/Tools/Unflattener.pm:940
> STACK Bio::SeqFeature::Tools::Unflattener::unflatten_group
> /Bio/SeqFeature/Tools/Unflattener.pm:1983
> STACK Bio::SeqFeature::Tools::Unflattener::unflatten_groups
> /Bio/SeqFeature/Tools/Unflattener.pm:1744
> STACK Bio::SeqFeature::Tools::Unflattener::unflatten_seq
> /Bio/SeqFeature/Tools/Unflattener.pm:1449
> STACK (eval) genbank2gff3.PLS:345
> STACK main::unflatten_seq genbank2gff3.PLS:344
> STACK toplevel genbank2gff3.PLS:209
>
> --------------------------------------
>
> Possible gene unflattening error withchromosome:NCBI35:1:1:994676:1: consult
> STDERR
>
> Using bioperl-1.5.0.RC2 under Linux.
>
>     Would be grateful for the hint,
>       Vladimir
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>


More information about the Bioperl-l mailing list