[Bioperl-l] FileCache.pm error
Jason Stajich
jason at cgt.duhs.duke.edu
Mon Jun 21 14:39:09 EDT 2004
You might be wanting to try SearchIO for parsing BLAST but sed and awk
will work I guess.
To write sequences in genbank format:
my $out = Bio::SeqIO->new(-format => 'genbank');
$out->write_seq($cdsseq);
If you want to get things in Batch from genbank see Bio::DB::GenBank.
-jason
On Mon, 21 Jun 2004, Marcelino Suzuki wrote:
> Thanks Jason. That worked.
>
> I have another question. The script works well, but I was wondering
> whether I can get the same CDS sequences in genbank format. I was able
> to create a html file (using sed and awk) from a blast search
> containing links to al 400 such sequences from proteins I am working
> with, ie:
>
> http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?
> val=34112904&itemID=36&view=gbwithparts
>
> and could get each sequence individually using the browser, but is
> there a way to batch those requests using bioperl?
>
> Thanks
>
> Marcelino
> On Jun 21, 2004, at 1:22 AM, Jason Stajich wrote:
>
> > Did you make the directory
> > /tmp/cache
> > on your machine?
> >
> > The FileCache stuff is overkill depending on what you want to.
> >
> > You can also leave it out by just saying:
> >
> > my $cachent = $ntdb;
> > my $cachepep= $pepdb;
> >
> > -jason
> > On Sun, 20 Jun 2004, Marcelino Suzuki wrote:
> >
> >> I am trying to run a script for getting CDS out of Genbank by Jason
> >> Stajich below that I saved as test2.pl, and get the following error
> >> message, that I believe is caused by my bioperl configuration (I just
> >> installed bioperl in MacOS X:
> >>
> >> ------------- EXCEPTION -------------
> >> MSG: Could not open primary index file
> >> STACK Bio::DB::FileCache::_open_database
> >> /Library/Perl/5.8.1/Bio/DB/FileCache.pm:321
> >> STACK Bio::DB::FileCache::new
> >> /Library/Perl/5.8.1/Bio/DB/FileCache.pm:127
> >> STACK toplevel test2.pl:14
> >>
> >> Does anyone have any idea why I get this error?
> >>
> >> Thanks
> >>
> >> Marcelino
> >>
> >>
> >> #!/usr/bin/perl -w
> >> use strict;
> >> use Bio::DB::GenBank;
> >> use Bio::DB::GenPept;
> >> use Bio::DB::FileCache;
> >> use Bio::Factory::FTLocationFactory;
> >> use Bio::SeqFeature::Generic;
> >>
> >> my $ntdb = new Bio::DB::GenBank;
> >> my $pepdb= new Bio::DB::GenPept;
> >>
> >> # do some caching in the event you're pulling up the same
> >> # chromosome and/or you are debugging
> >> my $cachent = new Bio::DB::FileCache(-kept => 1,
> >> -file => '/tmp/cache/nt.idx',
> >> -seqdb => $ntdb);
> >>
> >> my $cachepep = new Bio::DB::FileCache(-kept => 1,
> >> -file => '/tmp/cache/pep.idx',
> >> -seqdb => $pepdb);
> >>
> >> # obj to turn strings into Bio::Location object
> >> my $locfactory = new Bio::Factory::FTLocationFactory;
> >>
> >> # you might get these from a file (and they can be accessions too)
> >> my @protgis = (10956263);
> >>
> >> foreach my $gi ( @protgis ) {
> >> my $protseq = $cachepep->get_Seq_by_id($gi);
> >> if( ! $protseq ) { print STDERR "could not find a seq for
> >> gi:$gi\n";
> >> next;
> >> }
> >> foreach my $cds ( grep { $_->primary_tag eq 'CDS' }
> >> $protseq->get_SeqFeatures() )
> >> {
> >> next unless( $cds->has_tag('coded_by') ); # skip CDSes with no
> >> coded_by
> >> my ($codedby) = $cds->each_tag_value('coded_by');
> >> my ($ntacc,$loc) = split(/\:/, $codedby);
> >> $ntacc =~ s/(\.\d+)//; # genbank wants an accession not a
> >> versioned one
> >> my $cdslocation = $locfactory->from_string($loc);
> >> my $cdsfeature = new Bio::SeqFeature::Generic(-location =>
> >> $cdslocation);
> >> my $ntseq = $cachent->get_Seq_by_acc($ntacc);
> >> next unless $ntseq;
> >> $ntseq->add_SeqFeature($cdsfeature); # locate the feature on a
> >> seq
> >> my $cdsseq = $cdsfeature->spliced_seq();
> >> print "cds seq is ", $cdsseq->seq(), "\n";
> >> }
> >> }
> >>
> >>
> >>
> >> ======================================================================
> >> ==
> >> ====
> >> oOOOOo Marcelino Suzuki, Assistant
> >> Professor
> >> oOOO Chesapeake Biological Lab - Univ of
> >> Maryland
> >> Center Environm Science
> >> oOOOOOo. PO Box 38, One Williams St Solomons, MD
> >> 20688
> >> .oOOOOOOOOOo. suzuki at cbl.umces.edu -
> >> http://cbl.umces.edu
> >> .oOOOOOOOOOOOOOOooo.. Ph 410-326-7291 FAX 410-326-7341
> >> 0000000000000000000000000000000000000000000000000000000000000000000000
> >> 00
> >> 0000
> >>
> >> _______________________________________________
> >> Bioperl-l mailing list
> >> Bioperl-l at portal.open-bio.org
> >> http://portal.open-bio.org/mailman/listinfo/bioperl-l
> >>
> >
> > --
> > Jason Stajich
> > Duke University
> > jason at cgt.mc.duke.edu
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at portal.open-bio.org
> > http://portal.open-bio.org/mailman/listinfo/bioperl-l
> >
> ========================================================================
> ====
> oOOOOo Marcelino Suzuki, Assistant Professor
> oOOO Chesapeake Biological Lab - Univ of Maryland
> Center Environm Science
> oOOOOOo. PO Box 38, One Williams St Solomons, MD 20688
> .oOOOOOOOOOo. suzuki at cbl.umces.edu -
> http://cbl.umces.edu
> .oOOOOOOOOOOOOOOooo.. Ph 410-326-7291 FAX 410-326-7341
> 000000000000000000000000000000000000000000000000000000000000000000000000
> 0000
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>
--
Jason Stajich
Duke University
jason at cgt.mc.duke.edu
More information about the Bioperl-l
mailing list