[Bioperl-l] Annotation-DBLink- version numbers repeating
Jason Stajich
jason at bioperl.org
Thu Oct 19 17:44:51 UTC 2006
Yikes - I was worried that it might have been me.....
Okay I'll look into fixing it -- ChrisF - check in with me before
diving in, in case I've gotten it done and I expect your enzyme
assays might take up the time.
-jason
On Oct 19, 2006, at 10:11 AM, Hilmar Lapp wrote:
> Actually you did that Jason: http://tinyurl.com/ye2edk
>
> Apparently the motivation was to "parse swissprot fields in genpept
> file (dbsource)"?
>
> It clearly looks wrong to add the version. You've probably had a
> reason why you did this at the time but if we (you :) can't recover
> that I guess it's best to just fix it to do the right thing (in
> both places obviously).
>
> -hilmar
>
> On Oct 19, 2006, at 11:50 AM, Jason Stajich wrote:
>
>> Well there is explicit addition of the version to the primary id
>> so it isn't so much a parsing error as a deliberate decision to
>> append it.
>> see Bio::SeqIO::genbank
>>
>> to make the dblink
>> $annotation-
>> >add_Annotation
>> ('dblink',
>>
>> Bio::Annotation::DBLink->new
>> (-primary_id
>> => $id . "." . $version,
>> -version =>
>> $version,
>> -database =>
>> $db,
>> -tagname =>
>> 'dblink'));
>>
>> and the code to print the dblink back out in the writer already
>> assumes the version number is appended...
>>
>> foreach my $ref ( $seq->annotation->get_Annotations
>> ('dblink') ) {
>> # if ($ref->comment eq 'DBSOURCE') {
>> $self->_print('DBSOURCE accession ',
>> $ref->primary_id, "\n");
>> # }
>> }
>>
>> On Oct 19, 2006, at 6:56 AM, Hilmar Lapp wrote:
>>
>>> Here is the overload code:
>>>
>>> use overload '""' => sub {
>>> (($_[0]->database ? $_[0]->database . ':' : '' )
>>> . ($_[0]->primary_id ? $_[0]->primary_id : '')
>>> . ($_[0]->version ? '.' . $_[0]->version : ''))
>>> || '' };
>>>
>>> Except that the last '||' is redundant and unnecessary (it either
>>> does nothing or replaces an empty string with an empty string), I
>>> don't see the potential for duplicating the version number here -
>>> unless primary_id() did that, which I don't see it doing.
>>>
>>> So, to me this seems to come from a parsing error in the
>>> beginning, rather than an erroneous mangling of version into
>>> primary_id later.
>>>
>>> Is someone in the position to confirm this?
>>>
>>> -hilmar
>>>
>>> On Oct 19, 2006, at 1:00 AM, Jason Stajich wrote:
>>>
>>>> So I'm unsure what we should do here.
>>>>
>>>> We can certainly fix the problem which you report which is
>>>> relying on
>>>> the "" method -- if you were to do instead:
>>>> print $_->database, ":", $_->primary_id, "\n";
>>>>
>>>> you'll get the right answer. We at a minimum just fix the auto-
>>>> string converting method to do The Right Thing.
>>>>
>>>> But I am not sure if we should keep the version out of the
>>>> primary_id
>>>> field. This will require some rejiggering in several modules
>>>> when it
>>>> comes to printing DBlinks and I don't want to do this before the
>>>> release. I also am not sure if there was an explicit reason why
>>>> someone did put the version information in the primary_id. (I
>>>> hope it
>>>> wasn't me because I don't think I'm going to remember why).
>>>>
>>>> Does anyone else have a strong feeling?
>>>>
>>>> -jason
>>>> On Oct 17, 2006, at 12:01 PM, Erikjan wrote:
>>>>
>>>>> Hello,
>>>>>
>>>>> I noticed a little problem with the Annotation "DBLink" from
>>>>> GenBank entries
>>>>>
>>>>> When I run:
>>>>>
>>>>> perl -MBio::DB::GenBank -e 'my $gi =
>>>>> 56205924;$db=Bio::DB::GenBank->new(-format => "genbank"); my
>>>>> $seqio =
>>>>> $db->get_Stream_by_id($gi); my$seq = $seqio->next_seq; my
>>>>> $ac=$seq->annotation(); my @annotations = $ac->get_Annotations
>>>>> ("dblink");
>>>>> for(@annotations) { print $_, "\n";} print $INC{
>>>>> "Bio/Annotation/DBLink.pm" }, "\n"; '
>>>>>
>>>>> This yields:
>>>>>
>>>>> GenBank:AL591065.17.17
>>>>>
>>>>> and the place where the used Bio/Annotation/DBLink.pm resides.
>>>>>
>>>>> Can others repeat this?
>>>>>
>>>>> I have dug into the source a little and Bio::Annotation::DBLink
>>>>> seems to
>>>>> be the place where this happens: it has a concatenation which
>>>>> leads to
>>>>> that repeated version number.
>>>>>
>>>>> It this something that I should fix "client-side", so to speak, or
>>>>> is it
>>>>> worthwhile to add some logic to that concatenation to prevent
>>>>> this?
>>>>>
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Eric
>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Bioperl-l mailing list
>>>>> Bioperl-l at lists.open-bio.org
>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>
>>>> --
>>>> Jason Stajich, PhD
>>>> Miller Research Fellow
>>>> University of California
>>>> Dept of Plant and Microbial Biology
>>>> 321 Koshland Hall #3102
>>>> Berkeley, CA 94720-3102
>>>> lab: 510.642.8441
>>>> http://pmb.berkeley.edu/~taylor/people/js.html
>>>>
>>>>
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>
>>>
>>> --
>>> ===========================================================
>>> : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net :
>>> ===========================================================
>>>
>>>
>>>
>>>
>>>
>>
>> --
>> Jason Stajich, PhD
>> Miller Research Fellow
>> University of California
>> Dept of Plant and Microbial Biology
>> 321 Koshland Hall #3102
>> Berkeley, CA 94720-3102
>> lab: 510.642.8441
>> http://pmb.berkeley.edu/~taylor/people/js.html
>>
>>
>
> --
> ===========================================================
> : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net :
> ===========================================================
>
>
>
>
>
--
Jason Stajich, PhD
Miller Research Fellow
University of California
Dept of Plant and Microbial Biology
321 Koshland Hall #3102
Berkeley, CA 94720-3102
lab: 510.642.8441
http://pmb.berkeley.edu/~taylor/people/js.html
More information about the Bioperl-l
mailing list