[Bioperl-l] Problem retrieving CDS by Acession #

Ryan Golhar golharam at umdnj.edu
Thu Sep 7 17:16:46 UTC 2006



> -----Original Message-----
> From: Sean Davis [mailto:sdavis2 at mail.nih.gov] 
> Sent: Thursday, September 07, 2006 11:49 AM
> To: golharam at umdnj.edu
> Cc: bioperl-l at lists.open-bio.org; 'bioperl-l'
> Subject: Re: [Bioperl-l] Problem retrieving CDS by Acession #
> 
> 
> On Thursday 07 September 2006 10:32, Ryan Golhar wrote:
> > > On Thursday 07 September 2006 01:09, Ryan Golhar wrote:
> > > > Hi,
> > > >
> > > > I'm using Bio::DB::GenBank::get_Seq_by_acc() passing in a valid 
> > > > accession #, XM_547879.2, for instance.
> > > >
> > > > I get the message in return:
> > > >
> > > > -------------------- WARNING ---------------------
> > > > MSG: acc (gb|XM_547879.2) does not exist
> > > > ---------------------------------------------------
> > > >
> > > > If I go to NCBI, and enter the accession, the GenBank entry
> > >
> > > comes up.
> > >
> > > > At first I suspected it was the version number, but 
> removing the 
> > > > version number still causes the same error.
> > > >
> > > > Am I doing something wrong?
> > >
> > > from the Docs for Bio::DB::Genbank:
> > >
> > >     $seq = $gb->get_Seq_by_acc('J00522'); # Accession Number
> > >     $seq = $gb->get_Seq_by_version('J00522.1'); # 
> Accession.version
> > >     $seq = $gb->get_Seq_by_gi('405830'); # GI Number
> > >
> > > So, you might try using get_Seq_by_version(....).  I 
> didn't test it, 
> > > but give that a shot.
> >
> > get_Seq_by_version() worked.
> >
> > That does not explain why get_Seq_by_acc does not work with the 
> > primary part of the accession #.
> 
> As an example of why this shouldn't work, doing a search in 
> entrez (online 
> version) will bring up the newest version of an accession if 
> the version is 
> not included.  If one specifies the version, though, one gets 
> that version, 
> even if it is not the newest.  So, asking get_Seq_by_acc() 
> with a version and 
> ignoring the version would potentially get you the wrong 
> version for the 
> accession.  
> 
> If you know that you want the most recent version, just strip 
> the version 
> information and use get_Seq_by_acc().
> 
> Sean
> 


Sorry, maybe I'm not being clear.  Suppose I only had the accession #,
XM_547879.  If I call get_Seq_by_acc('XM_547879'), it gives the warning
above.  That shouldn't be because I'm giving a valid accession number.
I suspect something is wrong in the parsing of whatever NCBI is
returning.




More information about the Bioperl-l mailing list