[Bioperl-l] can't retrieve description using Bio::DB::EntrezGene

Sun Jul 17 20:38:33 UTC 2011

Hi

you should use pastebin http://pastebin.com/ for such long pieces of
code. Well, there's many ways to go around your problem. Depending on
the amount of info you need from the gene entry and how much you
really need to use the Bio::DB::EntrezGene (note that
Bio::DB::EntrezGene doesn't actual return the DNA sequence. You''ll
still need Bio::DB::Eutilities or Bio::DB::Genbank to get them)

1) If the module you use is not important, you only part of the info
from the record (gene coordinates and contig accession number, name,
etc...) and will need later to actually fetch the DNA sequences, use
esummary from eutilities. This will make the code faster (since it
downloads less information from the database) and easier to read. If
you want to know what info is available from there run the following
code which tells what you get for the gene 3014.

use Bio::DB::EUtilities;
my $eutil = Bio::DB::EUtilities->new(
                                     -eutil => 'esummary',
                                     -db    => 'gene',
                                     -id    => [3014],
                                    );
say $eutil->next_DocSum->to_string;

For genes whose UID has been replaced (such as 724021) , there will be
a value for 'CurrentID' and 'Status' will be 1. You can also use this
module to get the actual sequences if you need later
http://www.bioperl.org/wiki/HOWTO:EUtilities_Cookbook#How_do_I_retrieve_the_DNA_sequence_using_EntrezGene_IDs.3F

2) if you really want to use Bio::DB::EntrezGene and the name is all
you want to extract from the record (this is the answer to your
original question) then here's how to fix your code. The field you're
looking for is the Bio::Seq object (in $gene, not in $uncaptured as
you thought). It is however, in the annotations of the sequence. So
you need to get the annotations, then use the right key to get the
annotation you want. It will come in a hash tree so again you'll need
a key. The following code should work. It's quite confusing, but take
a look at this http://www.bioperl.org/wiki/HOWTO:Feature-Annotation#Getting_the_Annotations

my $gene        = $genein->next_seq;
my $annotations = $gene->annotation;
my ($anno)      = $anno_collection->get_Annotations('Official Full Name');
my $hash_ref    = $annotations->hash_tree;
my ($key)       = keys %{$hash_ref};
my $name        = $hash_ref->{$key};
say $name;

If you ever get confused, use the Data::Dumper module to see where
things are. For example, on the code above you could do the following:

my $gene        = $genein->next_seq;
use Data::Dumper;
print Dumper $gene;

3) also if you want to use Bio::DB::Entrezgene, but want to extract
more, I wouldn't look into the Bio::Seq object but into the
Bio::ASN1::EntrezGene object directly. It's still a mess, so use
Data::Dumper if you ever get lost on it to find your way. here's the
code to do it that way ($response here is a string with all of the
ASN1 entrezgene file, newlines and everything)

    use Bio::ASN1::EntrezGene;
    ## This use of the open function requires perl 5.8.0 or later
    open(my $seq_fh, "<", \$response) or die "Could not open sequences
string for reading: $!";
    my $parser = Bio::ASN1::EntrezGene->new(-fh => $seq_fh);
    while(my $result = $parser->next_seq){
      $result = $result->[0] if(ref($result) eq 'ARRAY');
      ## Data::Dumper can be used to look into the structure and find
where things are
#      use Data::Dumper;
#      print Dumper ($result);

      foreach my $p (@{$result->{'properties'}}){
        $p = $p->[0] if(ref($p) eq 'ARRAY');
        next unless ($p->{'label'} && $p->{'label'} eq 'Nomenclature');
        foreach my $pp (@{$p->{'properties'}}){
          $pp     = $pp->[0] if(ref($pp) eq 'ARRAY');
          $name   = $pp->{'text'} if ($pp->{'label'} && $pp->{'label'}
eq 'Official Full Name');
          $symbol = $pp->{'text'} if ($pp->{'label'} && $pp->{'label'}
eq 'Official Symbol');
        }
      }
    }

I use this piece of code on the program that I'm currently writing. I
also extract a bunch of more stuff from the entrezgene file. Take a
look at http://pastebin.com/LVB7QpxZ if want to make some sense of it.
This part is between lines 381 and 399. Part of what you're writing I
already wrote there no need to reinvent the wheel. Hopefully is
commented well enough for you to understand.

Carnë