[Bioperl-l] FileCache.pm error

Jason Stajich jason at cgt.duhs.duke.edu
Mon Jun 21 14:39:09 EDT 2004


You might be wanting to try SearchIO for parsing BLAST but sed and awk
will work I guess.

To write sequences in genbank format:
 my $out = Bio::SeqIO->new(-format => 'genbank');
 $out->write_seq($cdsseq);

If you want to get things in Batch from genbank see Bio::DB::GenBank.

-jason
On Mon, 21 Jun 2004, Marcelino Suzuki wrote:

> 	Thanks Jason.  That worked.
>
> 	I have another question. The script works well,  but I was wondering
> whether I can get the same CDS sequences in genbank format.  I was able
> to create a html file (using sed and awk) from a blast search
> containing links to al 400 such sequences from proteins I am working
> with, ie:
>
> 	http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?
> val=34112904&itemID=36&view=gbwithparts
>
> 	and could get each sequence individually using the browser, but is
> there a way to batch those requests using bioperl?
>
> 	Thanks
>
> 	Marcelino
> On Jun 21, 2004, at 1:22 AM, Jason Stajich wrote:
>
> > Did you make the directory
> > /tmp/cache
> > on your machine?
> >
> > The FileCache stuff is overkill depending on what you want to.
> >
> > You can also leave it out by just saying:
> >
> > my $cachent = $ntdb;
> > my $cachepep= $pepdb;
> >
> > -jason
> > On Sun, 20 Jun 2004, Marcelino Suzuki wrote:
> >
> >> 	I am trying to run a script for getting CDS out of Genbank by Jason
> >> Stajich below that I saved as test2.pl, and get the following error
> >> message, that I believe is caused by my bioperl configuration (I just
> >> installed bioperl in MacOS X:
> >>
> >> 	------------- EXCEPTION  -------------
> >> MSG: Could not open primary index file
> >> STACK Bio::DB::FileCache::_open_database
> >> /Library/Perl/5.8.1/Bio/DB/FileCache.pm:321
> >> STACK Bio::DB::FileCache::new
> >> /Library/Perl/5.8.1/Bio/DB/FileCache.pm:127
> >> STACK toplevel test2.pl:14
> >>
> >> 	Does anyone have any idea why I get this error?
> >>
> >> 	Thanks
> >>
> >> 	Marcelino
> >>
> >>
> >> #!/usr/bin/perl -w
> >> use strict;
> >> use Bio::DB::GenBank;
> >> use Bio::DB::GenPept;
> >> use Bio::DB::FileCache;
> >> use Bio::Factory::FTLocationFactory;
> >> use Bio::SeqFeature::Generic;
> >>
> >> my $ntdb = new Bio::DB::GenBank;
> >> my $pepdb= new Bio::DB::GenPept;
> >>
> >> # do some caching in the event you're pulling up the same
> >> # chromosome and/or you are debugging
> >> my $cachent = new Bio::DB::FileCache(-kept => 1,
> >>                                       -file => '/tmp/cache/nt.idx',
> >>                                       -seqdb => $ntdb);
> >>
> >> my $cachepep = new Bio::DB::FileCache(-kept => 1,
> >>                                        -file => '/tmp/cache/pep.idx',
> >>                                        -seqdb => $pepdb);
> >>
> >> # obj to turn strings into Bio::Location object
> >> my $locfactory = new Bio::Factory::FTLocationFactory;
> >>
> >> # you might get these from a file (and they can be accessions too)
> >> my @protgis = (10956263);
> >>
> >> foreach my $gi ( @protgis ) {
> >>    my $protseq = $cachepep->get_Seq_by_id($gi);
> >>    if( ! $protseq ) { print STDERR "could not find a seq for
> >> gi:$gi\n";
> >>                       next;
> >>                     }
> >>    foreach my $cds (  grep { $_->primary_tag eq 'CDS' }
> >>                            $protseq->get_SeqFeatures() )
> >>    {
> >>       next unless( $cds->has_tag('coded_by') ); # skip CDSes with no
> >> coded_by
> >>       my ($codedby) = $cds->each_tag_value('coded_by');
> >>       my ($ntacc,$loc) = split(/\:/, $codedby);
> >>       $ntacc =~ s/(\.\d+)//; # genbank wants an accession not a
> >> versioned one
> >>       my $cdslocation = $locfactory->from_string($loc);
> >>       my $cdsfeature = new Bio::SeqFeature::Generic(-location =>
> >> $cdslocation);
> >>       my $ntseq = $cachent->get_Seq_by_acc($ntacc);
> >>       next unless $ntseq;
> >>       $ntseq->add_SeqFeature($cdsfeature); # locate the feature on a
> >> seq
> >>       my $cdsseq = $cdsfeature->spliced_seq();
> >>       print "cds seq is ", $cdsseq->seq(), "\n";
> >>   }
> >> }
> >>
> >>
> >>
> >> ======================================================================
> >> ==
> >> ====
> >>              oOOOOo           			Marcelino Suzuki,  Assistant
> >> Professor
> >>            oOOO            Chesapeake Biological Lab - Univ of
> >> Maryland
> >> Center Environm Science
> >>         oOOOOOo.          		PO Box 38, One Williams St Solomons, MD
> >> 20688
> >>      .oOOOOOOOOOo.                      suzuki at cbl.umces.edu  -
> >> http://cbl.umces.edu
> >>    .oOOOOOOOOOOOOOOooo..    	 Ph 410-326-7291   FAX 410-326-7341
> >> 0000000000000000000000000000000000000000000000000000000000000000000000
> >> 00
> >> 0000
> >>
> >> _______________________________________________
> >> Bioperl-l mailing list
> >> Bioperl-l at portal.open-bio.org
> >> http://portal.open-bio.org/mailman/listinfo/bioperl-l
> >>
> >
> > --
> > Jason Stajich
> > Duke University
> > jason at cgt.mc.duke.edu
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at portal.open-bio.org
> > http://portal.open-bio.org/mailman/listinfo/bioperl-l
> >
> ========================================================================
> ====
>              oOOOOo           			Marcelino Suzuki,  Assistant Professor
>            oOOO            Chesapeake Biological Lab - Univ of Maryland
> Center Environm Science
>         oOOOOOo.          		PO Box 38, One Williams St Solomons, MD 20688
>      .oOOOOOOOOOo.                      suzuki at cbl.umces.edu  -
> http://cbl.umces.edu
>    .oOOOOOOOOOOOOOOooo..    	 Ph 410-326-7291   FAX 410-326-7341
> 000000000000000000000000000000000000000000000000000000000000000000000000
> 0000
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>

--
Jason Stajich
Duke University
jason at cgt.mc.duke.edu


More information about the Bioperl-l mailing list