[Bioperl-l] Bio::DB::RefSeq and NC_007092

Tue Mar 2 23:16:03 UTC 2010

I see.  I work mostly in the bacteria world so mammalian chromosomes shouldn't be an issue.  I just randomly picked it to test my script when it came up after I did a simple search for Bacillus in the Genome database.  

I'll look into docSum to help prevent unexpected large files from interrupting my script.

Thank you.

Veronica

> From: Russell.Smithies at agresearch.co.nz
> To: armendarez77 at hotmail.com; bioperl-l at lists.open-bio.org
> Date: Wed, 3 Mar 2010 12:08:51 +1300
> Subject: Re: [Bioperl-l] Bio::DB::RefSeq and NC_007092
> 
> NC_ accessions are all chromosomes so if you're unlucky enough to get a mammalian one, there's a fair chance it could be quite large.
> Take a look at this for accession number formats: http://www.ncbi.nlm.nih.gov/refseq/key.html
> 
> Also, it may help to check the docsum first to see how big the file is going to be?
> (the full Genbank file for this example is only 6MB in size)
> 
> ===================
> use Bio::DB::EUtilities;
> 
> my $factory = Bio::DB::EUtilities->new(-eutil => 'esearch',-db => 'nucleotide',-term => 'NC_007092' );
> 
> my ($id) = $factory->get_ids;
> 
> # get a summary
> $factory->reset_parameters(-eutil => 'esummary',-db => 'nucleotide',-id => $id);
> my $ds = $factory->next_DocSum;
> print "ID: $id\n";
> # flattened mode
> while (my $item = $ds->next_Item('flattened'))  {
>     # not all Items have content, so need to check...
>     printf("%-20s:%s\n",$item->get_name,$item->get_content) if $item->get_content;
> }
> print "\n";
> 
> 
> # download the full genbank file
> $factory = Bio::DB::EUtilities->new(-eutil => 'efetch',
>                          -db => 'nucleotide',
>                          -id => $id,
>                          -rettype => 'gbwithparts');
> $factory->get_Response(-file => "$id.gb");
> 
> ================
> 
> Hope this helps,
> 
> Russell Smithies 
> 
> Bioinformatics Applications Developer 
> T +64 3 489 9085 
> E  russell.smithies at agresearch.co.nz 
> 
> Invermay  Research Centre 
> Puddle Alley, 
> Mosgiel, 
> New Zealand 
> T  +64 3 489 3809   
> F  +64 3 489 9174  
> www.agresearch.co.nz 
> 
> 
> 
> 
> > -----Original Message-----
> > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> > bounces at lists.open-bio.org] On Behalf Of armendarez77 at hotmail.com
> > Sent: Wednesday, 3 March 2010 10:06 a.m.
> > To: bioperl-l at lists.open-bio.org
> > Subject: [Bioperl-l] Bio::DB::RefSeq and NC_007092
> > 
> > 
> > Hello,
> > 
> > I am writing a script to remotely access annotation files and parse
> > information using Bio::DB::RefSeq and Bio::DB::Genbank.  I was testing it
> > with random RefSeq accession numbers (NC_######) when something odd
> > happened.  When I used the accession number 'NC_007092', the script seemed
> > to freeze.  After some time, 'Out of Memory' was printed to the terminal.
> > 
> > When I investigated the annotation file associated with NC_007092, a
> > MapViewer page opened.  It turns out that NC_007092 is a genome shotgun
> > sequence, but it does not start with 'NZ' as I though all shotgun
> > sequences did.
> > 
> > Is this a random event that I don't have to worry much about or is there a
> > way to pre-screen accession numbers to ensure they are associated with
> > complete genome RefSeq files?
> > 
> > I've included my script in case there is something I missed that could
> > have prevented this.
> > 
> > Thank you,
> > 
> > Veronica
> > 
> > 
> > _________________
> > 
> > use strict;
> > use Bio::Perl;
> > use Getopt::Long;
> > use IO::Handle;
> > 
> > my $accessionNumber;
> > 
> > GetOptions("accessionNumber=s"=>\$accessionNumber);
> > unless($accessionNumber){
> >     print<<"OPTIONS";
> >     options for $0
> >     accessionNumber    -a    accession number
> > OPTIONS
> > die;
> > }
> > 
> > my $description = annotation_info($accessionNumber);
> > 
> > print "$description\n";
> > 
> > 
> > 
> > sub annotation_info{
> > 
> >     my $seqObj;
> > 
> >     my $accNum = shift(@_);
> > 
> >     my $rs = Bio::DB::RefSeq->new();
> >     my $gb = Bio::DB::GenBank->new();
> > 
> > 
> >     if($accNum =~ /\w\w_\d{6}/){ #RefSeq annotations include an underscore
> > in their accession number
> > 
> >         $seqObj = $rs->get_Seq_by_id($accNum);
> >     }
> >     elsif($accNum !~ /_/){ #GenBank annotation
> >         $seqObj = $gb->get_Seq_by_id($accNum);
> >     }
> > 
> >     return $seqObj->desc();
> > }
> > 
> > 
> > _________________________________________________________________
> > Hotmail: Trusted email with Microsoft's powerful SPAM protection.
> > http://clk.atdmt.com/GBL/go/201469226/direct/01/
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> =======================================================================
> Attention: The information contained in this message and/or attachments
> from AgResearch Limited is intended only for the persons or entities
> to which it is addressed and may contain confidential and/or privileged
> material. Any review, retransmission, dissemination or other use of, or
> taking of any action in reliance upon, this information by persons or
> entities other than the intended recipients is prohibited by AgResearch
> Limited. If you have received this message in error, please notify the
> sender immediately.
> =======================================================================
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

_________________________________________________________________
Your E-mail and More On-the-Go. Get Windows Live Hotmail Free.
http://clk.atdmt.com/GBL/go/201469229/direct/01/