[Bioperl-l] get_Stream_by_query() appears to return empty stream
Brian Osborne
bosborne11 at verizon.net
Tue Nov 20 23:50:00 UTC 2012
Felix,
I took a look at the Bio::DB::Query::GenBank documentation, it says this:
If you provide an array reference of IDs in -ids, the query will be ignored and the list of IDs will be used when the query is passed to a Bio::DB::GenBank object's get_Stream_by_query() method.
Bio::DB::Genbank queries "nucleotide", by default. You have GeneIDs. I see that you're setting "-id" to "gene" but note that you're passing that query to a plain Bio::DB::GenBank object. Not sure what the expected behavior is here.
I would try using the NCBI Eutilities for that second query, rather than Bio::DB::Query::GenBank (http://www.bioperl.org/wiki/HOWTO:EUtilities_Cookbook).
Brian O.
On Nov 1, 2012, at 8:01 PM, Felix Horns <rfhorns at gmail.com> wrote:
> Hello everyone.
>
> I am having trouble using the get_Stream_by_query() function
> in Bio::DB::GenBank. It seems to return an empty stream, such that
> $stream->next_seq never returns anything.
>
> However, $query->count is returning the expected value (139). Also,
> get_Stream_by_query() seems to be querying the database, as when I pass it
> an array of GeneIDs that have not been properly formatted, i.e.
> GeneID:7816864, instead of simply 7816864, it returns warnings and errors:
> "MSG: Warning(s) from GenBank: <PhraseNotFound>GeneID 7817709...; MSG:
> Error from Genbank: No items found.".
>
> I have included my full code below. I have also included the output from
> the code below that. The code is intended to find genes located within a
> genomic region. I will later find the protein domains and pathways that
> those genes are involved in.
>
> Any help would be greatly appreciated. I realize that this is probably a
> very simple question, but I am relatively new to BioPerl and I've spent the
> better part of the day trying to figure out such issues, so I would be very
> thankful for help.
>
> Felix
>
>
> #!/usr/bin/perl
> use strict;
> use Bio::SeqIO;
> use Bio::DB::EntrezGene;
> use Bio::DB::GenBank;
>
> # Load reference sequence
> # Load from local .gb file
> # Note that .gb file does not include sequences
> # my $gbfile = "NC_012660.1.gb";
> # my $seqio = Bio::SeqIO->new(-file => $gbfile);
> # my $ref_seq = $seqio->next_seq;
>
> # To access reference sequence programatically, uncomment this code
> my $gb = new Bio::DB::GenBank;
> my $ref_seq = $gb->get_Seq_by_acc("NC_012660.1");
>
> # Specify coordinates of gap
> my $gap_start = 2050506;
> my $gap_end = 2190530;
>
> my $gene_count = 0;
> my @features;
> my @starts;
> my @ends;
> my @db_xrefs;
>
> my @products;
> my @protein_ids;
>
> # Get gene features in gap
> for my $feat ($ref_seq->get_SeqFeatures) {
> my $start=$feat->location->start;
> my $end=$feat->location->end;
>
> if (($feat->primary_tag eq 'gene') &
> ($gap_start < $start) & ($start < $gap_end) &
> ($gap_start < $end) & ($end < $gap_end)) {
>
> $gene_count += 1;
>
> # Get GeneID reference
> my $db_xref = ($feat->get_tag_values('db_xref'))[0];
> $db_xref =~ s/GeneID://; # Trim "GeneID:" from start of $db_xref
>
> push @features, $feat;
> push @starts, $start;
> push @ends, $end;
> push @db_xrefs, $db_xref;
> }
> }
>
> # Get data about gene features from GeneID reference
> my $query = Bio::DB::Query::GenBank->new(-db => 'gene',
> -ids => [@db_xrefs]);
> my $stream = $gb->get_Stream_by_query($query);
>
> while (my $seq = $stream->next_seq) {
> for my $feat ($seq->all_SeqFeatures) {
> print "primary tag: ", $feat->primary_tag, "\n";
> for my $tag ($feat->get_all_tags) {
> print " tag: ", $tag, "\n";
> for my $value ($feat->get_tag_values($tag)) {
> print " value: ", $value, "\n";
> }
> }
> }
> }
>
> print $query->count,"\n";
> print $gene_count, "\n";
>
>
> OUTPUT
>> perl analyze_gap.pl
> 139
> 139
>
> Note that no "primary tag; tag; value" items are printed. Furthermore,
> when I put a print line immediately after the (while (my $seq =
> $stream->next_seq)) statement, it was never called, seemingly indicating
> that the stream is empty.
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
More information about the Bioperl-l
mailing list