[Bioperl-l] Bio::DB::RefSeq and NC_007092

Smithies, Russell Russell.Smithies at agresearch.co.nz
Tue Mar 2 23:08:51 UTC 2010


NC_ accessions are all chromosomes so if you're unlucky enough to get a mammalian one, there's a fair chance it could be quite large.
Take a look at this for accession number formats: http://www.ncbi.nlm.nih.gov/refseq/key.html

Also, it may help to check the docsum first to see how big the file is going to be?
(the full Genbank file for this example is only 6MB in size)

===================
use Bio::DB::EUtilities;

my $factory = Bio::DB::EUtilities->new(-eutil => 'esearch',-db => 'nucleotide',-term => 'NC_007092' );

my ($id) = $factory->get_ids;

# get a summary
$factory->reset_parameters(-eutil => 'esummary',-db => 'nucleotide',-id => $id);
my $ds = $factory->next_DocSum;
print "ID: $id\n";
# flattened mode
while (my $item = $ds->next_Item('flattened'))  {
    # not all Items have content, so need to check...
    printf("%-20s:%s\n",$item->get_name,$item->get_content) if $item->get_content;
}
print "\n";


# download the full genbank file
$factory = Bio::DB::EUtilities->new(-eutil => 'efetch',
                         -db => 'nucleotide',
                         -id => $id,
                         -rettype => 'gbwithparts');
$factory->get_Response(-file => "$id.gb");

================

Hope this helps,

Russell Smithies 

Bioinformatics Applications Developer 
T +64 3 489 9085 
E  russell.smithies at agresearch.co.nz 

Invermay  Research Centre 
Puddle Alley, 
Mosgiel, 
New Zealand 
T  +64 3 489 3809   
F  +64 3 489 9174  
www.agresearch.co.nz 




> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of armendarez77 at hotmail.com
> Sent: Wednesday, 3 March 2010 10:06 a.m.
> To: bioperl-l at lists.open-bio.org
> Subject: [Bioperl-l] Bio::DB::RefSeq and NC_007092
> 
> 
> Hello,
> 
> I am writing a script to remotely access annotation files and parse
> information using Bio::DB::RefSeq and Bio::DB::Genbank.  I was testing it
> with random RefSeq accession numbers (NC_######) when something odd
> happened.  When I used the accession number 'NC_007092', the script seemed
> to freeze.  After some time, 'Out of Memory' was printed to the terminal.
> 
> When I investigated the annotation file associated with NC_007092, a
> MapViewer page opened.  It turns out that NC_007092 is a genome shotgun
> sequence, but it does not start with 'NZ' as I though all shotgun
> sequences did.
> 
> Is this a random event that I don't have to worry much about or is there a
> way to pre-screen accession numbers to ensure they are associated with
> complete genome RefSeq files?
> 
> I've included my script in case there is something I missed that could
> have prevented this.
> 
> Thank you,
> 
> Veronica
> 
> 
> _________________
> 
> use strict;
> use Bio::Perl;
> use Getopt::Long;
> use IO::Handle;
> 
> my $accessionNumber;
> 
> GetOptions("accessionNumber=s"=>\$accessionNumber);
> unless($accessionNumber){
>     print<<"OPTIONS";
>     options for $0
>     accessionNumber    -a    accession number
> OPTIONS
> die;
> }
> 
> my $description = annotation_info($accessionNumber);
> 
> print "$description\n";
> 
> 
> 
> sub annotation_info{
> 
>     my $seqObj;
> 
>     my $accNum = shift(@_);
> 
>     my $rs = Bio::DB::RefSeq->new();
>     my $gb = Bio::DB::GenBank->new();
> 
> 
>     if($accNum =~ /\w\w_\d{6}/){ #RefSeq annotations include an underscore
> in their accession number
> 
>         $seqObj = $rs->get_Seq_by_id($accNum);
>     }
>     elsif($accNum !~ /_/){ #GenBank annotation
>         $seqObj = $gb->get_Seq_by_id($accNum);
>     }
> 
>     return $seqObj->desc();
> }
> 
> 
> _________________________________________________________________
> Hotmail: Trusted email with Microsoft's powerful SPAM protection.
> http://clk.atdmt.com/GBL/go/201469226/direct/01/
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
=======================================================================
Attention: The information contained in this message and/or attachments
from AgResearch Limited is intended only for the persons or entities
to which it is addressed and may contain confidential and/or privileged
material. Any review, retransmission, dissemination or other use of, or
taking of any action in reliance upon, this information by persons or
entities other than the intended recipients is prohibited by AgResearch
Limited. If you have received this message in error, please notify the
sender immediately.
=======================================================================




More information about the Bioperl-l mailing list