[Bioperl-l] Re: Problem with Unflattener
Scott Cain
cain at cshl.org
Tue Dec 9 19:34:47 EST 2003
Thanks.
Re: GFF from ensembl: You can get it as GFF3? Could you send me a link
if so. (You can tell I'm a little incredulous.)
Scott
On Tue, 2003-12-09 at 21:19, Chris Mungall wrote:
> Hi Scott
>
> Bug squashed, do a cvs update and it should work
>
> The problem was that this record uses /locus_tag instead of /gene - the
> unflattener should be able to detect this in magic mode, but there was one
> place where "/gene" was hardcoded.
>
> By the way, for this particular record you can get the exact same data
> from ensembl, already unflattened (or rather, never flattened into genbank
> format in the first place). Nevertheless, this sort of thing is extremely
> useful for testing Unflattener.pm, so carry on testing! Really I should do
> a full QC by comparing ensembl sourced GFF and the results of
> ensembl->genbank->unflattener->gff, but I haven't got round to this yet.
>
> Cheers
> Chris
>
> On Tue, 9 Dec 2003, Scott Cain wrote:
>
> > Hello Chris,
> >
> > I am using Unflattener to create a genbank2gff script that is more
> > robust than what we have now. As one of my example Genbank files, I am
> > using an A. gambiae chromosome:
> >
> > http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=nucleotide&list_uids=31249389&dopt=GenBank&term=NW_045730&qty=1
> >
> > When I try to run the simplified script below, I get the following
> > error:
> >
> > ------------- EXCEPTION -------------
> > MSG: structure_type 2 is currently unknown
> > STACK Bio::SeqFeature::Tools::Unflattener::unflatten_seq /usr/local/lib/perl5/site_perl/5.8.1/Bio/SeqFeature/Tools/Unflattener.pm:1345
> > STACK toplevel ./simple.pl:19
> >
> > --------------------------------------
> >
> > As I read Unflattener, structure_type should only be set if I set it
> > explicitly, right? So how is it getting set here, and how do I make it
> > stop?
> >
> > Here's the script:
> > #!/usr/bin/perl -w
> > use strict;
> > use Bio::SeqIO;
> > use Bio::SeqFeature::Tools::Unflattener;
> >
> > my $unflattener = Bio::SeqFeature::Tools::Unflattener->new;
> >
> > my $seqio = Bio::SeqIO->new(
> > -file => 'NW_045730.1.gbk',
> > -format => 'GenBank'
> > );
> >
> > open OUT, '>out.gff';
> >
> > while ( my $seq = $seqio->next_seq() ) {
> > my $acc = $seq->accession;
> >
> > # get top level unflattended SeqFeatureI objects
> > my @sfs = $unflattener->unflatten_seq(
> > -seq => $seq,
> > -use_magic => 1
> > );
> >
> > foreach my $sf (@sfs) {
> > my $gffio =
> > $sf->gff_format( Bio::Tools::GFF->new( -gff_version => 3 ) );
> >
> > $sf->seq_id($acc);
> >
> > if ( $sf->primary_tag() eq 'source' ) {
> > $sf->add_tag_value( 'ID', $acc );
> > $sf->primary_tag('region');
> > }
> > print OUT $sf->gff_string . "\n";
> > }
> > }
> > close OUT;
> > ---------------------------
> >
> > Thanks,
> > Scott
> >
> >
>
--
------------------------------------------------------------------------
Scott Cain, Ph. D. cain at cshl.org
GMOD Coordinator (http://www.gmod.org/) 216-392-3087
Cold Spring Harbor Laboratory
More information about the Bioperl-l
mailing list