[Bioperl-l] Bio::DB::RefSeq and NC_007092
armendarez77 at hotmail.com
armendarez77 at hotmail.com
Tue Mar 2 21:06:17 UTC 2010
Hello,
I am writing a script to remotely access annotation files and parse information using Bio::DB::RefSeq and Bio::DB::Genbank. I was testing it with random RefSeq accession numbers (NC_######) when something odd happened. When I used the accession number 'NC_007092', the script seemed to freeze. After some time, 'Out of Memory' was printed to the terminal.
When I investigated the annotation file associated with NC_007092, a MapViewer page opened. It turns out that NC_007092 is a genome shotgun sequence, but it does not start with 'NZ' as I though all shotgun sequences did.
Is this a random event that I don't have to worry much about or is there a way to pre-screen accession numbers to ensure they are associated with complete genome RefSeq files?
I've included my script in case there is something I missed that could have prevented this.
Thank you,
Veronica
_________________
use strict;
use Bio::Perl;
use Getopt::Long;
use IO::Handle;
my $accessionNumber;
GetOptions("accessionNumber=s"=>\$accessionNumber);
unless($accessionNumber){
print<<"OPTIONS";
options for $0
accessionNumber -a accession number
OPTIONS
die;
}
my $description = annotation_info($accessionNumber);
print "$description\n";
sub annotation_info{
my $seqObj;
my $accNum = shift(@_);
my $rs = Bio::DB::RefSeq->new();
my $gb = Bio::DB::GenBank->new();
if($accNum =~ /\w\w_\d{6}/){ #RefSeq annotations include an underscore in their accession number
$seqObj = $rs->get_Seq_by_id($accNum);
}
elsif($accNum !~ /_/){ #GenBank annotation
$seqObj = $gb->get_Seq_by_id($accNum);
}
return $seqObj->desc();
}
_________________________________________________________________
Hotmail: Trusted email with Microsoft’s powerful SPAM protection.
http://clk.atdmt.com/GBL/go/201469226/direct/01/
More information about the Bioperl-l
mailing list