[Bioperl-l] How (from where) to retrieve FieldInfo objects?

Sat Jul 2 02:36:12 UTC 2011

2011/6/30 Carnë Draug <carandraug+dev at gmail.com>:
> On 29 June 2011 23:30, Smithies, Russell
> <Russell.Smithies at agresearch.co.nz> wrote:
>> How about just returning ASN.1 then parsing that?
>> There's far more data in that format than any of the others.
>>
>> my $factory = Bio::DB::EUtilities->new(-eutil      => 'esearch',
>>                                       -term       => 'h2afx[sym] AND human[organism]',
>>                                       -db         => 'gene',
>>                                                   -usehistory => 'y');
>>
>>
>> my $hist  = $factory->next_History || die "No history data returned";
>>
>> $factory->set_parameters(-eutil   => 'efetch',-history => $hist);
>>
>> print Dumper $factory->get_Response;
>
> When I do this, I get a XML with the ASN.1 inside the tag pre. Is is
> supposed to be this way? Should I extract it myself? Shouldn't the
> method do this? It's nice that I can get so many information but
> wouldn't it be lighter on the NCBI server if I could ask only for the
> info that I need rather than the whole record?

After much work, I've done this and as such I'm sharing back the code
in case someone comes across it. Basically, get_Response returns a
HTML::Message object. Since I couldn't find a method to get it pretty,
I used HTML::Parser to do it. It seems that the ASN.1/entrezgene are
all inside the <pre> tag. Also, if there's more than one gene, all
genes are inside the same <pre> tag. Here's the code I used.

use Bio::DB::EUtilities;
use HTML::Parser;

my @ids = qw(9555 3014);
my $factory = Bio::DB::EUtilities->new(
                                      -eutil   => 'efetch',
                                      -db      => 'gene',
                                      -id      => \@ids,
                                      -retmode => 'asn1',
                                      );
my $html = $factory->get_Response->content;

my $parser = HTML::Parser->new(
                                api_version => 3,
                                start_h     => [\&handle_start],
                                end_h       => [\&handle_end],
                                text_h      => [\&handle_text, 'dtext'],
                                report_tags => qw(pre),
                              );
my $seq;
{
  my $inside_tag = 0;
  sub handle_start {
    $inside_tag = 1;
  }
  sub handle_text {
    $seq = $_[0] if $inside_tag;
    return 4;
  }
  sub handle_end {
    $inside_tag = 0;
  }
}
$parser->parse($html);

After running parse, $seq holds a sequence file that can be opened
with Bio::SeqIO or written to disk.

Carnë