[Bioperl-l] Annotation-DBLink- version numbers repeating

Chris Fields cjfields at uiuc.edu
Thu Oct 19 19:07:05 UTC 2006


Jason, Hilmar, 

How about changing the default parsed dblink in SeqIO::genbank (line 520) to

		if( $dbsource =~ /^(\S*?)\s*accession\s+(\S+)\.(\d+)/ ) {
		    my ($db,$id,$version) = ($1,$2,$3);
		    $annotation->add_Annotation
			('dblink',
			 Bio::Annotation::DBLink->new
			 (-primary_id => $id,
			  -version => $version,
			  -database => $db || 'GenBank',
			  -tagname => 'dblink'));
		} 

It passes tests and catches the optional database ('embl' for the bugzilla
report).  The output sequence still doesn't print the DB if it isn't GenBank
via write_seq(), but that should be too hard to fix (famous last words).

Okay, okay, back to the assays...

Christopher Fields
Postdoctoral Researcher - Switzer Lab
Dept. of Biochemistry
University of Illinois Urbana-Champaign 


> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Jason Stajich
> Sent: Thursday, October 19, 2006 12:45 PM
> To: Hilmar Lapp
> Cc: bioperl-l at lists.open-bio.org; Erikjan
> Subject: Re: [Bioperl-l] Annotation-DBLink- version numbers repeating
> 
> Yikes - I was worried that it might have been me.....
> 
> Okay I'll look into fixing it -- ChrisF - check in with me before
> diving in, in case I've gotten it done and I expect your enzyme
> assays might take up the time.
> 
> -jason
> On Oct 19, 2006, at 10:11 AM, Hilmar Lapp wrote:
> 
> > Actually you did that Jason: http://tinyurl.com/ye2edk
> >
> > Apparently the motivation was to "parse swissprot fields in genpept
> > file (dbsource)"?
> >
> > It clearly looks wrong to add the version. You've probably had a
> > reason why you did this at the time but if we (you :) can't recover
> > that I guess it's best to just fix it to do the right thing (in
> > both places obviously).
> >
> > 	-hilmar
> >
> > On Oct 19, 2006, at 11:50 AM, Jason Stajich wrote:
> >
> >> Well there is explicit addition of the version to the primary id
> >> so it isn't so much a parsing error as a deliberate decision to
> >> append it.
> >> see Bio::SeqIO::genbank
> >>
> >> to make the dblink
> >>                                               $annotation-
> >> >add_Annotation
> >>                                                     ('dblink',
> >>
> >> Bio::Annotation::DBLink->new
> >>                                                      (-primary_id
> >> => $id . "." . $version,
> >>                                                       -version =>
> >> $version,
> >>                                                       -database =>
> >> $db,
> >>                                                       -tagname =>
> >> 'dblink'));
> >>
> >> and the code to print the dblink back out in the writer already
> >> assumes the version number is appended...
> >>
> >>         foreach my $ref ( $seq->annotation->get_Annotations
> >> ('dblink') ) {
> >>             # if ($ref->comment eq 'DBSOURCE') {
> >>             $self->_print('DBSOURCE    accession ',
> >>                           $ref->primary_id, "\n");
> >>             # }
> >>         }
> >>
> >> On Oct 19, 2006, at 6:56 AM, Hilmar Lapp wrote:
> >>
> >>> Here is the overload code:
> >>>
> >>> use overload '""' => sub {
> >>> 	(($_[0]->database ? $_[0]->database . ':' : '' )
> >>> 	. ($_[0]->primary_id ? $_[0]->primary_id : '')
> >>> 	. ($_[0]->version ? '.' . $_[0]->version : ''))
> >>> 	|| '' };
> >>>
> >>> Except that the last '||' is redundant and unnecessary (it either
> >>> does nothing or replaces an empty string with an empty string), I
> >>> don't see the potential for duplicating the version number here -
> >>> unless primary_id() did that, which I don't see it doing.
> >>>
> >>> So, to me this seems to come from a parsing error in the
> >>> beginning, rather than an erroneous mangling of version into
> >>> primary_id later.
> >>>
> >>> Is someone in the position to confirm this?
> >>>
> >>> 	-hilmar
> >>>
> >>> On Oct 19, 2006, at 1:00 AM, Jason Stajich wrote:
> >>>
> >>>> So I'm unsure what we should do here.
> >>>>
> >>>> We can certainly fix the problem which you report which is
> >>>> relying on
> >>>> the "" method -- if you were to do instead:
> >>>> print $_->database, ":", $_->primary_id, "\n";
> >>>>
> >>>> you'll get the right answer.  We at a minimum just fix the auto-
> >>>> string converting method to do The Right Thing.
> >>>>
> >>>> But I am not sure if we should keep the version out of the
> >>>> primary_id
> >>>> field.  This will require some rejiggering in several modules
> >>>> when it
> >>>> comes to printing DBlinks and I don't want to do this before the
> >>>> release. I also am not sure if there was an explicit reason why
> >>>> someone did put the version information in the primary_id. (I
> >>>> hope it
> >>>> wasn't me because I don't think I'm going to remember why).
> >>>>
> >>>> Does anyone else have a strong feeling?
> >>>>
> >>>> -jason
> >>>> On Oct 17, 2006, at 12:01 PM, Erikjan wrote:
> >>>>
> >>>>> Hello,
> >>>>>
> >>>>> I noticed a little problem with the Annotation "DBLink" from
> >>>>> GenBank entries
> >>>>>
> >>>>> When I run:
> >>>>>
> >>>>> perl -MBio::DB::GenBank -e 'my $gi =
> >>>>> 56205924;$db=Bio::DB::GenBank->new(-format => "genbank"); my
> >>>>> $seqio =
> >>>>> $db->get_Stream_by_id($gi); my$seq = $seqio->next_seq; my
> >>>>> $ac=$seq->annotation(); my @annotations = $ac->get_Annotations
> >>>>> ("dblink");
> >>>>> for(@annotations) { print $_, "\n";} print $INC{
> >>>>> "Bio/Annotation/DBLink.pm" }, "\n"; '
> >>>>>
> >>>>> This yields:
> >>>>>
> >>>>>    GenBank:AL591065.17.17
> >>>>>
> >>>>> and the place where the used Bio/Annotation/DBLink.pm resides.
> >>>>>
> >>>>> Can others repeat this?
> >>>>>
> >>>>> I have dug into the source a little and Bio::Annotation::DBLink
> >>>>> seems to
> >>>>> be the place where this happens: it has a concatenation which
> >>>>> leads to
> >>>>> that repeated version number.
> >>>>>
> >>>>> It this something that I should fix "client-side", so to speak, or
> >>>>> is it
> >>>>> worthwhile to add some logic to that concatenation to prevent
> >>>>> this?
> >>>>>
> >>>>>
> >>>>> Thanks,
> >>>>>
> >>>>> Eric
> >>>>>
> >>>>>
> >>>>>
> >>>>> _______________________________________________
> >>>>> Bioperl-l mailing list
> >>>>> Bioperl-l at lists.open-bio.org
> >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>>>
> >>>> --
> >>>> Jason Stajich, PhD
> >>>> Miller Research Fellow
> >>>> University of California
> >>>> Dept of Plant and Microbial Biology
> >>>> 321 Koshland Hall #3102
> >>>> Berkeley, CA 94720-3102
> >>>> lab: 510.642.8441
> >>>> http://pmb.berkeley.edu/~taylor/people/js.html
> >>>>
> >>>>
> >>>> _______________________________________________
> >>>> Bioperl-l mailing list
> >>>> Bioperl-l at lists.open-bio.org
> >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>>>
> >>>
> >>> --
> >>> ===========================================================
> >>> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
> >>> ===========================================================
> >>>
> >>>
> >>>
> >>>
> >>>
> >>
> >> --
> >> Jason Stajich, PhD
> >> Miller Research Fellow
> >> University of California
> >> Dept of Plant and Microbial Biology
> >> 321 Koshland Hall #3102
> >> Berkeley, CA 94720-3102
> >> lab: 510.642.8441
> >> http://pmb.berkeley.edu/~taylor/people/js.html
> >>
> >>
> >
> > --
> > ===========================================================
> > : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
> > ===========================================================
> >
> >
> >
> >
> >
> 
> --
> Jason Stajich, PhD
> Miller Research Fellow
> University of California
> Dept of Plant and Microbial Biology
> 321 Koshland Hall #3102
> Berkeley, CA 94720-3102
> lab: 510.642.8441
> http://pmb.berkeley.edu/~taylor/people/js.html
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l




More information about the Bioperl-l mailing list