[Bioperl-l] get_Stream_by_query() appears to return empty stream

Felix Horns rfhorns at gmail.com
Fri Nov 2 00:01:34 UTC 2012


Hello everyone.

I am having trouble using the get_Stream_by_query() function
in Bio::DB::GenBank.  It seems to return an empty stream, such that
$stream->next_seq never returns anything.

However, $query->count is returning the expected value (139).  Also,
get_Stream_by_query() seems to be querying the database, as when I pass it
an array of GeneIDs that have not been properly formatted, i.e.
GeneID:7816864, instead of simply 7816864, it returns warnings and errors:
"MSG: Warning(s) from GenBank: <PhraseNotFound>GeneID 7817709...; MSG:
Error from Genbank: No items found.".

I have included my full code below. I have also included the output from
the code below that.  The code is intended to find genes located within a
genomic region. I will later find the protein domains and pathways that
those genes are involved in.

Any help would be greatly appreciated.  I realize that this is probably a
very simple question, but I am relatively new to BioPerl and I've spent the
better part of the day trying to figure out such issues, so I would be very
thankful for help.

Felix


#!/usr/bin/perl
use strict;
use Bio::SeqIO;
use Bio::DB::EntrezGene;
use Bio::DB::GenBank;

# Load reference sequence
# Load from local .gb file
# Note that .gb file does not include sequences
# my $gbfile = "NC_012660.1.gb";
# my $seqio = Bio::SeqIO->new(-file => $gbfile);
# my $ref_seq = $seqio->next_seq;

# To access reference sequence programatically, uncomment this code
my $gb = new Bio::DB::GenBank;
my $ref_seq = $gb->get_Seq_by_acc("NC_012660.1");

# Specify coordinates of gap
my $gap_start = 2050506;
my $gap_end = 2190530;

my $gene_count = 0;
my @features;
my @starts;
my @ends;
my @db_xrefs;

my @products;
my @protein_ids;

# Get gene features in gap
for my $feat ($ref_seq->get_SeqFeatures) {
  my $start=$feat->location->start;
  my $end=$feat->location->end;

  if (($feat->primary_tag eq 'gene') &
      ($gap_start < $start) & ($start < $gap_end) &
      ($gap_start < $end) & ($end < $gap_end)) {

    $gene_count += 1;

    # Get GeneID reference
    my $db_xref = ($feat->get_tag_values('db_xref'))[0];
    $db_xref =~ s/GeneID://;    # Trim "GeneID:" from start of $db_xref

    push @features, $feat;
    push @starts, $start;
    push @ends, $end;
    push @db_xrefs, $db_xref;
  }
}

# Get data about gene features from GeneID reference
my $query = Bio::DB::Query::GenBank->new(-db => 'gene',
 -ids => [@db_xrefs]);
my $stream = $gb->get_Stream_by_query($query);

while (my $seq = $stream->next_seq) {
  for my $feat ($seq->all_SeqFeatures) {
    print "primary tag: ", $feat->primary_tag, "\n";
    for my $tag ($feat->get_all_tags) {
      print "  tag: ", $tag, "\n";
      for my $value ($feat->get_tag_values($tag)) {
print "    value: ", $value, "\n";
      }
    }
  }
}

print $query->count,"\n";
print $gene_count, "\n";


OUTPUT
> perl analyze_gap.pl
139
139

Note that no "primary tag; tag; value" items are printed.  Furthermore,
when I put a print line immediately after the (while (my $seq =
$stream->next_seq)) statement, it was never called, seemingly indicating
that the stream is empty.



More information about the Bioperl-l mailing list