[Bioperl-l] Bio::DB::RefSeq and NC_007092
armendarez77 at hotmail.com
armendarez77 at hotmail.com
Tue Mar 2 23:16:03 UTC 2010
I see. I work mostly in the bacteria world so mammalian chromosomes shouldn't be an issue. I just randomly picked it to test my script when it came up after I did a simple search for Bacillus in the Genome database.
I'll look into docSum to help prevent unexpected large files from interrupting my script.
Thank you.
Veronica
> From: Russell.Smithies at agresearch.co.nz
> To: armendarez77 at hotmail.com; bioperl-l at lists.open-bio.org
> Date: Wed, 3 Mar 2010 12:08:51 +1300
> Subject: Re: [Bioperl-l] Bio::DB::RefSeq and NC_007092
>
> NC_ accessions are all chromosomes so if you're unlucky enough to get a mammalian one, there's a fair chance it could be quite large.
> Take a look at this for accession number formats: http://www.ncbi.nlm.nih.gov/refseq/key.html
>
> Also, it may help to check the docsum first to see how big the file is going to be?
> (the full Genbank file for this example is only 6MB in size)
>
> ===================
> use Bio::DB::EUtilities;
>
> my $factory = Bio::DB::EUtilities->new(-eutil => 'esearch',-db => 'nucleotide',-term => 'NC_007092' );
>
> my ($id) = $factory->get_ids;
>
> # get a summary
> $factory->reset_parameters(-eutil => 'esummary',-db => 'nucleotide',-id => $id);
> my $ds = $factory->next_DocSum;
> print "ID: $id\n";
> # flattened mode
> while (my $item = $ds->next_Item('flattened')) {
> # not all Items have content, so need to check...
> printf("%-20s:%s\n",$item->get_name,$item->get_content) if $item->get_content;
> }
> print "\n";
>
>
> # download the full genbank file
> $factory = Bio::DB::EUtilities->new(-eutil => 'efetch',
> -db => 'nucleotide',
> -id => $id,
> -rettype => 'gbwithparts');
> $factory->get_Response(-file => "$id.gb");
>
> ================
>
> Hope this helps,
>
> Russell Smithies
>
> Bioinformatics Applications Developer
> T +64 3 489 9085
> E russell.smithies at agresearch.co.nz
>
> Invermay Research Centre
> Puddle Alley,
> Mosgiel,
> New Zealand
> T +64 3 489 3809
> F +64 3 489 9174
> www.agresearch.co.nz
>
>
>
>
> > -----Original Message-----
> > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> > bounces at lists.open-bio.org] On Behalf Of armendarez77 at hotmail.com
> > Sent: Wednesday, 3 March 2010 10:06 a.m.
> > To: bioperl-l at lists.open-bio.org
> > Subject: [Bioperl-l] Bio::DB::RefSeq and NC_007092
> >
> >
> > Hello,
> >
> > I am writing a script to remotely access annotation files and parse
> > information using Bio::DB::RefSeq and Bio::DB::Genbank. I was testing it
> > with random RefSeq accession numbers (NC_######) when something odd
> > happened. When I used the accession number 'NC_007092', the script seemed
> > to freeze. After some time, 'Out of Memory' was printed to the terminal.
> >
> > When I investigated the annotation file associated with NC_007092, a
> > MapViewer page opened. It turns out that NC_007092 is a genome shotgun
> > sequence, but it does not start with 'NZ' as I though all shotgun
> > sequences did.
> >
> > Is this a random event that I don't have to worry much about or is there a
> > way to pre-screen accession numbers to ensure they are associated with
> > complete genome RefSeq files?
> >
> > I've included my script in case there is something I missed that could
> > have prevented this.
> >
> > Thank you,
> >
> > Veronica
> >
> >
> > _________________
> >
> > use strict;
> > use Bio::Perl;
> > use Getopt::Long;
> > use IO::Handle;
> >
> > my $accessionNumber;
> >
> > GetOptions("accessionNumber=s"=>\$accessionNumber);
> > unless($accessionNumber){
> > print<<"OPTIONS";
> > options for $0
> > accessionNumber -a accession number
> > OPTIONS
> > die;
> > }
> >
> > my $description = annotation_info($accessionNumber);
> >
> > print "$description\n";
> >
> >
> >
> > sub annotation_info{
> >
> > my $seqObj;
> >
> > my $accNum = shift(@_);
> >
> > my $rs = Bio::DB::RefSeq->new();
> > my $gb = Bio::DB::GenBank->new();
> >
> >
> > if($accNum =~ /\w\w_\d{6}/){ #RefSeq annotations include an underscore
> > in their accession number
> >
> > $seqObj = $rs->get_Seq_by_id($accNum);
> > }
> > elsif($accNum !~ /_/){ #GenBank annotation
> > $seqObj = $gb->get_Seq_by_id($accNum);
> > }
> >
> > return $seqObj->desc();
> > }
> >
> >
> > _________________________________________________________________
> > Hotmail: Trusted email with Microsoft's powerful SPAM protection.
> > http://clk.atdmt.com/GBL/go/201469226/direct/01/
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> =======================================================================
> Attention: The information contained in this message and/or attachments
> from AgResearch Limited is intended only for the persons or entities
> to which it is addressed and may contain confidential and/or privileged
> material. Any review, retransmission, dissemination or other use of, or
> taking of any action in reliance upon, this information by persons or
> entities other than the intended recipients is prohibited by AgResearch
> Limited. If you have received this message in error, please notify the
> sender immediately.
> =======================================================================
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
_________________________________________________________________
Your E-mail and More On-the-Go. Get Windows Live Hotmail Free.
http://clk.atdmt.com/GBL/go/201469229/direct/01/
More information about the Bioperl-l
mailing list