[Bioperl-l] Bio::DB::GenBank question (acc vs. version)

Chris Fields cjfields at illinois.edu
Wed Sep 16 12:55:56 UTC 2009


Bill, George,

It's worth clarifying the docs on these and adding a TODO for them  
(and test cases!), but I tend to agree.  I believe, re: version, we  
can possibly use Bio::DB::SeqVersion to grab the right one, but it'll  
need further investigation.

As for generic accession w/o version, efetch does support it but it  
does have problems (pulling up more than one sequence in rare cases,  
for instance).

chris

On Sep 13, 2009, at 10:47 AM, bill at genenformics.com wrote:

> I would like to make a few comments about get_Seq_by_version and
> get_Seq_by_acc. Although both functions use the same NCBI eUtils  
> API, they
> are interpreted differently for a Seq_id with version or without  
> version.
>
> 1. If the Seq_id has a version, GenBank ID server will locate
> corresponding GI and emit the correct sequence.
> 2. If the Seq_id does not have a version, GBDataLoader  will try to  
> find
> the latest version number for that Seq_id, which is relatively  
> slower and
> the version number the ID server find out may NOT always be the  
> latest.
>
> IMHO, for both efficiency and consistency,
> get_Seq_by_gi > get_Seq_by_version >> get_Seq_by_acc
>
> Bill
>
>
>>
>> It looks like get Bio::DB::GenBank::get_Seq_by_{version,acc} are
>> functionally identical.  They seem to trickle down to the same place
>> and walking through these two requests yields almost identical http
>> requests:
>>
>>  $db->get_Seq_by_version('J00522.1')
>>  GET
>> http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?retmode=text&rettype=gbwithparts&db=nucleotide&tool=bioperl&id=J00522.1&usehistory=n
>>
>>  $db->get_Seq_by_acc('J00522')
>>  GET
>> http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?retmode=text&rettype=gbwithparts&db=nucleotide&tool=bioperl&id=J00522&usehistory=n
>>
>> The only difference that I can see is that they index into different
>> secions of %PARAMSTRING defined in Bio::DB::GenBank, but those
>> sections contain the same information.
>>
>> I'd like a general purpose tool that does The Right Thing whether
>> there's a .1 on the end of an identifier or not, and am just trying  
>> to
>> make sure I'm not doing something troublesome.
>>
>> Am I correct about the above?
>>
>> While I'm at it, I think that the comment
>>
>>  # note that get_Stream_by_version is not implemented
>>
>> in Bio::DB::GenBank was made obsolete by whoever commented out the
>>
>>  $self->throw(...)
>>
>> in get_Stream_by_version in Bio::WebDBSeqI.pm.
>>
>> I'll happily commit the trivial doc fix if no one shoots down the
>> idea. (can't help big, might as well help small...).
>>
>> Thanks,
>>
>> g.
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l




More information about the Bioperl-l mailing list