[Bioperl-l] Re: Problem with Unflattener

Chris Mungall cjm at fruitfly.org
Tue Dec 9 21:19:16 EST 2003


Hi Scott

Bug squashed, do a cvs update and it should work

The problem was that this record uses /locus_tag instead of /gene - the
unflattener should be able to detect this in magic mode, but there was one
place where "/gene" was hardcoded.

By the way, for this particular record you can get the exact same data
from ensembl, already unflattened (or rather, never flattened into genbank
format in the first place). Nevertheless, this sort of thing is extremely
useful for testing Unflattener.pm, so carry on testing! Really I should do
a full QC by comparing ensembl sourced GFF and the results of
ensembl->genbank->unflattener->gff, but I haven't got round to this yet.

Cheers
Chris

On Tue, 9 Dec 2003, Scott Cain wrote:

> Hello Chris,
>
> I am using Unflattener to create a genbank2gff script that is more
> robust than what we have now.  As one of my example Genbank files, I am
> using an A. gambiae chromosome:
>
> http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=nucleotide&list_uids=31249389&dopt=GenBank&term=NW_045730&qty=1
>
> When I try to run the simplified script below, I get the following
> error:
>
> ------------- EXCEPTION  -------------
> MSG: structure_type 2 is currently unknown
> STACK Bio::SeqFeature::Tools::Unflattener::unflatten_seq /usr/local/lib/perl5/site_perl/5.8.1/Bio/SeqFeature/Tools/Unflattener.pm:1345
> STACK toplevel ./simple.pl:19
>
> --------------------------------------
>
> As I read Unflattener, structure_type should only be set if I set it
> explicitly, right?  So how is it getting set here, and how do I make it
> stop?
>
> Here's the script:
> #!/usr/bin/perl -w
> use strict;
> use Bio::SeqIO;
> use Bio::SeqFeature::Tools::Unflattener;
>
> my $unflattener = Bio::SeqFeature::Tools::Unflattener->new;
>
> my $seqio = Bio::SeqIO->new(
>     -file   => 'NW_045730.1.gbk',
>     -format => 'GenBank'
> );
>
> open OUT, '>out.gff';
>
> while ( my $seq = $seqio->next_seq() ) {
>     my $acc = $seq->accession;
>
>     # get top level unflattended SeqFeatureI objects
>     my @sfs = $unflattener->unflatten_seq(
>         -seq       => $seq,
>         -use_magic => 1
>     );
>
>     foreach my $sf (@sfs) {
>         my $gffio =
>           $sf->gff_format( Bio::Tools::GFF->new( -gff_version => 3 ) );
>
>         $sf->seq_id($acc);
>
>         if ( $sf->primary_tag() eq 'source' ) {
>             $sf->add_tag_value( 'ID', $acc );
>             $sf->primary_tag('region');
>         }
>         print OUT $sf->gff_string . "\n";
>     }
> }
> close OUT;
> ---------------------------
>
> Thanks,
> Scott
>
>




More information about the Bioperl-l mailing list