[Bioperl-l] How (from where) to retrieve FieldInfo objects?
Carnë Draug
carandraug+dev at gmail.com
Sat Jul 2 02:36:12 UTC 2011
2011/6/30 Carnë Draug <carandraug+dev at gmail.com>:
> On 29 June 2011 23:30, Smithies, Russell
> <Russell.Smithies at agresearch.co.nz> wrote:
>> How about just returning ASN.1 then parsing that?
>> There's far more data in that format than any of the others.
>>
>> my $factory = Bio::DB::EUtilities->new(-eutil => 'esearch',
>> -term => 'h2afx[sym] AND human[organism]',
>> -db => 'gene',
>> -usehistory => 'y');
>>
>>
>> my $hist = $factory->next_History || die "No history data returned";
>>
>> $factory->set_parameters(-eutil => 'efetch',-history => $hist);
>>
>> print Dumper $factory->get_Response;
>
> When I do this, I get a XML with the ASN.1 inside the tag pre. Is is
> supposed to be this way? Should I extract it myself? Shouldn't the
> method do this? It's nice that I can get so many information but
> wouldn't it be lighter on the NCBI server if I could ask only for the
> info that I need rather than the whole record?
After much work, I've done this and as such I'm sharing back the code
in case someone comes across it. Basically, get_Response returns a
HTML::Message object. Since I couldn't find a method to get it pretty,
I used HTML::Parser to do it. It seems that the ASN.1/entrezgene are
all inside the <pre> tag. Also, if there's more than one gene, all
genes are inside the same <pre> tag. Here's the code I used.
use Bio::DB::EUtilities;
use HTML::Parser;
my @ids = qw(9555 3014);
my $factory = Bio::DB::EUtilities->new(
-eutil => 'efetch',
-db => 'gene',
-id => \@ids,
-retmode => 'asn1',
);
my $html = $factory->get_Response->content;
my $parser = HTML::Parser->new(
api_version => 3,
start_h => [\&handle_start],
end_h => [\&handle_end],
text_h => [\&handle_text, 'dtext'],
report_tags => qw(pre),
);
my $seq;
{
my $inside_tag = 0;
sub handle_start {
$inside_tag = 1;
}
sub handle_text {
$seq = $_[0] if $inside_tag;
return 4;
}
sub handle_end {
$inside_tag = 0;
}
}
$parser->parse($html);
After running parse, $seq holds a sequence file that can be opened
with Bio::SeqIO or written to disk.
Carnë
More information about the Bioperl-l
mailing list