[Bioperl-l] Annotation-DBLink- version numbers repeating

Chris Fields cjfields at uiuc.edu
Thu Oct 19 18:03:52 UTC 2006


Also seems that the DBSOURCE line isn't caught correctly and stuffs it by
default into a GenBank dblink (the dbsource ihn the test case is EMBL, not
GenBank).  

http://bugzilla.open-bio.org/show_bug.cgi?id=2124

It looks like NCBI may be now using:

DBSOURCE    embl accession Z49548.1

instead of the old version:

DBSOURCE    embl locus SCYJR048W, accession Z49548.1

I don't recall NCBI mentioning changes regarding DBSOURCE in any of the
recent release notes.

Chris

> Actually you did that Jason: http://tinyurl.com/ye2edk
> 
> Apparently the motivation was to "parse swissprot fields in genpept
> file (dbsource)"?
> 
> It clearly looks wrong to add the version. You've probably had a
> reason why you did this at the time but if we (you :) can't recover
> that I guess it's best to just fix it to do the right thing (in both
> places obviously).
> 
> 	-hilmar
> 
> On Oct 19, 2006, at 11:50 AM, Jason Stajich wrote:
> 
> > Well there is explicit addition of the version to the primary id so
> > it isn't so much a parsing error as a deliberate decision to append
> > it.
> > see Bio::SeqIO::genbank
> >
> > to make the dblink
> >                                               $annotation-
> > >add_Annotation
> >                                                     ('dblink',
> >
> > Bio::Annotation::DBLink->new
> >                                                      (-primary_id
> > => $id . "." . $version,
> >                                                       -version =>
> > $version,
> >                                                       -database =>
> > $db,
> >                                                       -tagname =>
> > 'dblink'));
> >
> > and the code to print the dblink back out in the writer already
> > assumes the version number is appended...
> >
> >         foreach my $ref ( $seq->annotation->get_Annotations
> > ('dblink') ) {
> >             # if ($ref->comment eq 'DBSOURCE') {
> >             $self->_print('DBSOURCE    accession ',
> >                           $ref->primary_id, "\n");
> >             # }
> >         }
> >
> > On Oct 19, 2006, at 6:56 AM, Hilmar Lapp wrote:
> >
> >> Here is the overload code:
> >>
> >> use overload '""' => sub {
> >> 	(($_[0]->database ? $_[0]->database . ':' : '' )
> >> 	. ($_[0]->primary_id ? $_[0]->primary_id : '')
> >> 	. ($_[0]->version ? '.' . $_[0]->version : ''))
> >> 	|| '' };
> >>
> >> Except that the last '||' is redundant and unnecessary (it either
> >> does nothing or replaces an empty string with an empty string), I
> >> don't see the potential for duplicating the version number here -
> >> unless primary_id() did that, which I don't see it doing.
> >>
> >> So, to me this seems to come from a parsing error in the
> >> beginning, rather than an erroneous mangling of version into
> >> primary_id later.
> >>
> >> Is someone in the position to confirm this?
> >>
> >> 	-hilmar
> >>
> >> On Oct 19, 2006, at 1:00 AM, Jason Stajich wrote:
> >>
> >>> So I'm unsure what we should do here.
> >>>
> >>> We can certainly fix the problem which you report which is
> >>> relying on
> >>> the "" method -- if you were to do instead:
> >>> print $_->database, ":", $_->primary_id, "\n";
> >>>
> >>> you'll get the right answer.  We at a minimum just fix the auto-
> >>> string converting method to do The Right Thing.
> >>>
> >>> But I am not sure if we should keep the version out of the
> >>> primary_id
> >>> field.  This will require some rejiggering in several modules
> >>> when it
> >>> comes to printing DBlinks and I don't want to do this before the
> >>> release. I also am not sure if there was an explicit reason why
> >>> someone did put the version information in the primary_id. (I
> >>> hope it
> >>> wasn't me because I don't think I'm going to remember why).
> >>>
> >>> Does anyone else have a strong feeling?
> >>>
> >>> -jason
> >>> On Oct 17, 2006, at 12:01 PM, Erikjan wrote:
> >>>
> >>>> Hello,
> >>>>
> >>>> I noticed a little problem with the Annotation "DBLink" from
> >>>> GenBank entries
> >>>>
> >>>> When I run:
> >>>>
> >>>> perl -MBio::DB::GenBank -e 'my $gi =
> >>>> 56205924;$db=Bio::DB::GenBank->new(-format => "genbank"); my
> >>>> $seqio =
> >>>> $db->get_Stream_by_id($gi); my$seq = $seqio->next_seq; my
> >>>> $ac=$seq->annotation(); my @annotations = $ac->get_Annotations
> >>>> ("dblink");
> >>>> for(@annotations) { print $_, "\n";} print $INC{
> >>>> "Bio/Annotation/DBLink.pm" }, "\n"; '
> >>>>
> >>>> This yields:
> >>>>
> >>>>    GenBank:AL591065.17.17
> >>>>
> >>>> and the place where the used Bio/Annotation/DBLink.pm resides.
> >>>>
> >>>> Can others repeat this?
> >>>>
> >>>> I have dug into the source a little and Bio::Annotation::DBLink
> >>>> seems to
> >>>> be the place where this happens: it has a concatenation which
> >>>> leads to
> >>>> that repeated version number.
> >>>>
> >>>> It this something that I should fix "client-side", so to speak, or
> >>>> is it
> >>>> worthwhile to add some logic to that concatenation to prevent this?
> >>>>
> >>>>
> >>>> Thanks,
> >>>>
> >>>> Eric
> >>>>
> >>>>
> >>>>
> >>>> _______________________________________________
> >>>> Bioperl-l mailing list
> >>>> Bioperl-l at lists.open-bio.org
> >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>>
> >>> --
> >>> Jason Stajich, PhD
> >>> Miller Research Fellow
> >>> University of California
> >>> Dept of Plant and Microbial Biology
> >>> 321 Koshland Hall #3102
> >>> Berkeley, CA 94720-3102
> >>> lab: 510.642.8441
> >>> http://pmb.berkeley.edu/~taylor/people/js.html
> >>>
> >>>
> >>> _______________________________________________
> >>> Bioperl-l mailing list
> >>> Bioperl-l at lists.open-bio.org
> >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>>
> >>
> >> --
> >> ===========================================================
> >> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
> >> ===========================================================
> >>
> >>
> >>
> >>
> >>
> >
> > --
> > Jason Stajich, PhD
> > Miller Research Fellow
> > University of California
> > Dept of Plant and Microbial Biology
> > 321 Koshland Hall #3102
> > Berkeley, CA 94720-3102
> > lab: 510.642.8441
> > http://pmb.berkeley.edu/~taylor/people/js.html
> >
> >
> 
> --
> ===========================================================
> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
> ===========================================================
> 
> 
> 
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l




More information about the Bioperl-l mailing list