[Bioperl-l] sets of sequences - how to read?

Fields, Christopher J cjfields at illinois.edu
Fri May 17 04:08:04 UTC 2013


This doesn't surprise me too much; I know there have been some changes brewing, but didn't know when they would land.  I guess that would be... <looks at watch>... now.

My feeling is this will require writing some code for a higher-level layer of abstraction, say a Bio::DB::* (which would allow some internal indexing of the files maybe using a Bio::Index::*, look ups for specific gene IDs, etc).  How hard that would be to implement is another thing, have no idea w/o seeing what the data look like beyond they are in ASN1.

chris

On May 15, 2013, at 8:53 PM, Carnë Draug <carandraug+dev at gmail.com> wrote:

> Hi
> 
> when accessing entrez gene using eutils to get multiple genes, NCBI
> now returns an Entrezgene-Set[1] rather than a list of EntrezGene.
> This change must have happened sometime on the last 2 months. Compare:
> 
> use Bio::DB::EUtilities;
> 
> my %sets = (
>  eutil   => 'efetch',
>  db      => 'gene',
>  retmode => 'text',
>  rettype => 'asn1',
>  email   => 'bioperl-l at lists.open-bio.org',
> );
> 
> ## this mimics the previous behaviour of the NCBI server but the
> multiple requests will annoy their servers
> my @ids = (3014, 85235);
> my $response;
> foreach (@ids) {
>  my $fetcher = Bio::DB::EUtilities->new(%sets, id => $_);
>  $response .= $fetcher->get_Response->content;
> }
> print $fetcher->get_Response->content;
> 
> ## this used to be the right way to do it, but now returns an Entrezgene-Set
> my $fetcher = Bio::DB::EUtilities->new(%sets, id => \@ids);
> $response .= $fetcher->get_Response->content;
> print $fetcher->get_Response->content;
> 
> There is no module to read these Entrezgene-Set in Perl at the moment,
> since Bio::ASN1::EntrezGene; is not able to handle them. I have
> contacted the module author and set him a fix[2] and he said he'll try
> to look into it next week.
> 
> However, even with the fix there is another problem. How would one
> access a set of sequences using the Bio::SeqIO API? There is no method
> to do that. One could say, to ignore them, and make next_seq return
> the next sequence of the set. But then we are losing data. After all,
> it's perfectly viable to have multiple Entrezgene-Set in one file.
> What would be the right way to do this?
> 
> Carnë
> 
> [1] http://0-www.ncbi.nlm.nih.gov.elis.tmu.edu.tw/IEB/ToolBox/CPP_DOC/asn_spec/Entrezgene-Set.html
> [2] https://github.com/carandraug/bio-asn1-entrezgene/commit/69d505056d8b7897df6271ffb7a5f39d58873c6b
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l





More information about the Bioperl-l mailing list