[Bioperl-l] Re: Problem with Unflattener

Scott Cain cain at cshl.org
Tue Dec 9 19:34:47 EST 2003


Thanks.

Re: GFF from ensembl: You can get it as GFF3?  Could you send me a link
if so.  (You can tell I'm a little incredulous.)

Scott

On Tue, 2003-12-09 at 21:19, Chris Mungall wrote:
> Hi Scott
> 
> Bug squashed, do a cvs update and it should work
> 
> The problem was that this record uses /locus_tag instead of /gene - the
> unflattener should be able to detect this in magic mode, but there was one
> place where "/gene" was hardcoded.
> 
> By the way, for this particular record you can get the exact same data
> from ensembl, already unflattened (or rather, never flattened into genbank
> format in the first place). Nevertheless, this sort of thing is extremely
> useful for testing Unflattener.pm, so carry on testing! Really I should do
> a full QC by comparing ensembl sourced GFF and the results of
> ensembl->genbank->unflattener->gff, but I haven't got round to this yet.
> 
> Cheers
> Chris
> 
> On Tue, 9 Dec 2003, Scott Cain wrote:
> 
> > Hello Chris,
> >
> > I am using Unflattener to create a genbank2gff script that is more
> > robust than what we have now.  As one of my example Genbank files, I am
> > using an A. gambiae chromosome:
> >
> > http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=nucleotide&list_uids=31249389&dopt=GenBank&term=NW_045730&qty=1
> >
> > When I try to run the simplified script below, I get the following
> > error:
> >
> > ------------- EXCEPTION  -------------
> > MSG: structure_type 2 is currently unknown
> > STACK Bio::SeqFeature::Tools::Unflattener::unflatten_seq /usr/local/lib/perl5/site_perl/5.8.1/Bio/SeqFeature/Tools/Unflattener.pm:1345
> > STACK toplevel ./simple.pl:19
> >
> > --------------------------------------
> >
> > As I read Unflattener, structure_type should only be set if I set it
> > explicitly, right?  So how is it getting set here, and how do I make it
> > stop?
> >
> > Here's the script:
> > #!/usr/bin/perl -w
> > use strict;
> > use Bio::SeqIO;
> > use Bio::SeqFeature::Tools::Unflattener;
> >
> > my $unflattener = Bio::SeqFeature::Tools::Unflattener->new;
> >
> > my $seqio = Bio::SeqIO->new(
> >     -file   => 'NW_045730.1.gbk',
> >     -format => 'GenBank'
> > );
> >
> > open OUT, '>out.gff';
> >
> > while ( my $seq = $seqio->next_seq() ) {
> >     my $acc = $seq->accession;
> >
> >     # get top level unflattended SeqFeatureI objects
> >     my @sfs = $unflattener->unflatten_seq(
> >         -seq       => $seq,
> >         -use_magic => 1
> >     );
> >
> >     foreach my $sf (@sfs) {
> >         my $gffio =
> >           $sf->gff_format( Bio::Tools::GFF->new( -gff_version => 3 ) );
> >
> >         $sf->seq_id($acc);
> >
> >         if ( $sf->primary_tag() eq 'source' ) {
> >             $sf->add_tag_value( 'ID', $acc );
> >             $sf->primary_tag('region');
> >         }
> >         print OUT $sf->gff_string . "\n";
> >     }
> > }
> > close OUT;
> > ---------------------------
> >
> > Thanks,
> > Scott
> >
> >
> 
-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                         cain at cshl.org
GMOD Coordinator (http://www.gmod.org/)                     216-392-3087
Cold Spring Harbor Laboratory



More information about the Bioperl-l mailing list