[Bioperl-l] question about Bio::DB::GenBank

Jason Stajich jason@chg.mc.duke.edu
Wed, 13 Jun 2001 10:20:29 -0400 (EDT)


A diff of the changes to the original would be sufficient.  

A number of people have contributed to the code that Elia started and
would be happy to incorperate changes that are appropriate back into the
src code base.  

If you think you'd like to help improve the toolkit we are happy to accept
more hands on deck.  You might want to read the bioperl philosopy and get
acquainted with the project's goals at http://bio.perl.org if you are
interested.

-jason

On Tue, 12 Jun 2001, Bob Mangold wrote:

> Thanks to all.
> 
> In the future I'll be sure to be more descriptive with my problem. Here's my
> second question though. I've nearly got a work around figured out (or at least
> a more graceful death), but it requires modifiying
> 'lib/perl5/site_perl/5.6.0/Bio/SeqIO/genbank.pm' which is one the library
> files. Now I'm new to really working with modules and OO programing, but I
> assume there has to be some way to submit my work around. Not only that, I'm
> sure that Elia Stupka (the author of the script) doesn't want me modifing the
> code and re-distributing it. So what do I do with my work around?
> 
> -Bob
> 
> --- Jason Stajich <jason@chg.mc.duke.edu> wrote:
> > Bob - 
> > [ this will be a longwinded answer in case someone else wants to jump in
> > and fight the good fight with me wrt fixing genbank retrieval]
> > 
> > In the future, to be able to accurately reproduce your reported bug we
> > typically need:
> > 
> > a) when does 'all of the sudden' mean, yesterday vs today?
> > b) what version of bioperl you are using
> > c) what is the output of
> > % perldoc -m Bio::SeqIO::genbank | grep '$Id' 
> > 
> > Feel free to submit things like this to the bug tracking interface at 
> > http://www.bioperl.org/Bugs/
> > 
> > Now, all that said, let's see about your problem, which is not due to a
> > specific module version as I soon found out.
> > 
> > Hmm this is a complete genome, needless to say seqs this big probably
> > never made it into our test cases.
> > 
> > Since this is like a refseq sequence, just references to other
> > sequences, this is not actually genbank format (at least what bioperl
> > expects). So that CONTIG part is what is tripping us the genbank parser.
> > 
> > However, if we were smarter about parsing, $seq->seq() wouldn't return
> > anything if you were to run this on the file stored locally because there
> > are only sequence id references, not actual sequence data in this report.
> > 
> > Now there is a workaround, you can fetch entire seq as a fasta.  You have
> > to forgo having annotation though.
> > 
> > $gb = new Bio::DB::GenBank(-format => 'fasta');
> > 
> > However this too does not work either, and it does work in the tests for
> > smaller files, so perhaps since this is a rather large sequence something
> > is falling out at ncbi.   
> > 
> > We've basically reverse engineered their web scripts as best as possible
> > to allow us to write nice little objects for retrieval, but since we don't
> > have access to any one help.
> > 
> > Now we should be able to use the EMBL server which provides a very easy to
> > query interface based on accession number.  For your purposes this
> > requires just using a different DB module (isn't OO programming nice...)
> > 
> > #!/usr/local/bin/perl
> > use Bio::DB::EMBL;
> > $gb = new Bio::DB::EMBL();
> > my($id) = "AE004439";
> > $seq = $gb->get_Seq_by_id($id);
> > print $seq->seq ."\n";
> > 
> > This unfortunately only returns a sequence 12376 bases long, while the
> > genbank record appears to be 2257487 bases long.  Hmm, I'm stuck and can't
> > dive in more deeply at this point.
> > 
> > This appears to be a combination of NCBI magic and some incompatibilities.
> > I guess this should be submitted into the bugtracking system soon.
> > 
> > -jason
> > 
> > On Tue, 12 Jun 2001, Bob Mangold wrote:
> > 
> > > I'm farily new to using bioperl, so maybe I'm missing something. I've been
> > > succesfully using the Genbank tools to fetch data, but all of a sudden it
> > won't
> > > work. It appears that complete genome files in GenBank often contain a
> > contig
> > > of other files as opposed to the entire sequence (for example see
> > AE004439).
> > > Maybe I'm worng but it appears that Bio::DB::GenBank can't deal with this. 
> > > 
> > > Can someone confirm this and maybe point me to a work around.
> > > 
> > > Here is my code:
> > > 
> > > #!/usr/local/bin/perl
> > > use Bio::DB::GenBank;
> > > $gb = new Bio::DB::GenBank();
> > > my($id) = "AE004439";
> > > $seq = $gb->get_Seq_by_id($id);
> > > print $seq->seq ."\n";
> > > 
> > > The result is:
> > > 
> > > Can't call method "_generic_seqfeature" on an undefined value at
> > > /usr/local/lib/perl5/site_perl/5.6.0/Bio/SeqIO/genbank.pm line 272.
> > > 
> > > -Bob Mangold
> > > 
> > > __________________________________________________
> > > Do You Yahoo!?
> > > Get personalized email addresses from Yahoo! Mail - only $35 
> > > a year!  http://personal.mail.yahoo.com/
> > > _______________________________________________
> > > Bioperl-l mailing list
> > > Bioperl-l@bioperl.org
> > > http://bioperl.org/mailman/listinfo/bioperl-l
> > > 
> > 
> > Jason Stajich
> > jason@chg.mc.duke.edu
> > Center for Human Genetics
> > Duke University Medical Center 
> > http://www.chg.duke.edu/ 
> > 
> > 
> 
> 
> __________________________________________________
> Do You Yahoo!?
> Get personalized email addresses from Yahoo! Mail - only $35 
> a year!  http://personal.mail.yahoo.com/
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@bioperl.org
> http://bioperl.org/mailman/listinfo/bioperl-l
> 

Jason Stajich
jason@chg.mc.duke.edu
Center for Human Genetics
Duke University Medical Center 
http://www.chg.duke.edu/