[Bioperl-l] get_Stream_by_query() appears to return empty stream
Felix Horns
rfhorns at gmail.com
Fri Nov 2 00:01:34 UTC 2012
Hello everyone.
I am having trouble using the get_Stream_by_query() function
in Bio::DB::GenBank. It seems to return an empty stream, such that
$stream->next_seq never returns anything.
However, $query->count is returning the expected value (139). Also,
get_Stream_by_query() seems to be querying the database, as when I pass it
an array of GeneIDs that have not been properly formatted, i.e.
GeneID:7816864, instead of simply 7816864, it returns warnings and errors:
"MSG: Warning(s) from GenBank: <PhraseNotFound>GeneID 7817709...; MSG:
Error from Genbank: No items found.".
I have included my full code below. I have also included the output from
the code below that. The code is intended to find genes located within a
genomic region. I will later find the protein domains and pathways that
those genes are involved in.
Any help would be greatly appreciated. I realize that this is probably a
very simple question, but I am relatively new to BioPerl and I've spent the
better part of the day trying to figure out such issues, so I would be very
thankful for help.
Felix
#!/usr/bin/perl
use strict;
use Bio::SeqIO;
use Bio::DB::EntrezGene;
use Bio::DB::GenBank;
# Load reference sequence
# Load from local .gb file
# Note that .gb file does not include sequences
# my $gbfile = "NC_012660.1.gb";
# my $seqio = Bio::SeqIO->new(-file => $gbfile);
# my $ref_seq = $seqio->next_seq;
# To access reference sequence programatically, uncomment this code
my $gb = new Bio::DB::GenBank;
my $ref_seq = $gb->get_Seq_by_acc("NC_012660.1");
# Specify coordinates of gap
my $gap_start = 2050506;
my $gap_end = 2190530;
my $gene_count = 0;
my @features;
my @starts;
my @ends;
my @db_xrefs;
my @products;
my @protein_ids;
# Get gene features in gap
for my $feat ($ref_seq->get_SeqFeatures) {
my $start=$feat->location->start;
my $end=$feat->location->end;
if (($feat->primary_tag eq 'gene') &
($gap_start < $start) & ($start < $gap_end) &
($gap_start < $end) & ($end < $gap_end)) {
$gene_count += 1;
# Get GeneID reference
my $db_xref = ($feat->get_tag_values('db_xref'))[0];
$db_xref =~ s/GeneID://; # Trim "GeneID:" from start of $db_xref
push @features, $feat;
push @starts, $start;
push @ends, $end;
push @db_xrefs, $db_xref;
}
}
# Get data about gene features from GeneID reference
my $query = Bio::DB::Query::GenBank->new(-db => 'gene',
-ids => [@db_xrefs]);
my $stream = $gb->get_Stream_by_query($query);
while (my $seq = $stream->next_seq) {
for my $feat ($seq->all_SeqFeatures) {
print "primary tag: ", $feat->primary_tag, "\n";
for my $tag ($feat->get_all_tags) {
print " tag: ", $tag, "\n";
for my $value ($feat->get_tag_values($tag)) {
print " value: ", $value, "\n";
}
}
}
}
print $query->count,"\n";
print $gene_count, "\n";
OUTPUT
> perl analyze_gap.pl
139
139
Note that no "primary tag; tag; value" items are printed. Furthermore,
when I put a print line immediately after the (while (my $seq =
$stream->next_seq)) statement, it was never called, seemingly indicating
that the stream is empty.
More information about the Bioperl-l
mailing list