[Bioperl-l] FileCache.pm error
Marcelino Suzuki
suzuki at cbl.umces.edu
Mon Jun 21 11:33:24 EDT 2004
Thanks Jason. That worked.
I have another question. The script works well, but I was wondering
whether I can get the same CDS sequences in genbank format. I was able
to create a html file (using sed and awk) from a blast search
containing links to al 400 such sequences from proteins I am working
with, ie:
http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?
val=34112904&itemID=36&view=gbwithparts
and could get each sequence individually using the browser, but is
there a way to batch those requests using bioperl?
Thanks
Marcelino
On Jun 21, 2004, at 1:22 AM, Jason Stajich wrote:
> Did you make the directory
> /tmp/cache
> on your machine?
>
> The FileCache stuff is overkill depending on what you want to.
>
> You can also leave it out by just saying:
>
> my $cachent = $ntdb;
> my $cachepep= $pepdb;
>
> -jason
> On Sun, 20 Jun 2004, Marcelino Suzuki wrote:
>
>> I am trying to run a script for getting CDS out of Genbank by Jason
>> Stajich below that I saved as test2.pl, and get the following error
>> message, that I believe is caused by my bioperl configuration (I just
>> installed bioperl in MacOS X:
>>
>> ------------- EXCEPTION -------------
>> MSG: Could not open primary index file
>> STACK Bio::DB::FileCache::_open_database
>> /Library/Perl/5.8.1/Bio/DB/FileCache.pm:321
>> STACK Bio::DB::FileCache::new
>> /Library/Perl/5.8.1/Bio/DB/FileCache.pm:127
>> STACK toplevel test2.pl:14
>>
>> Does anyone have any idea why I get this error?
>>
>> Thanks
>>
>> Marcelino
>>
>>
>> #!/usr/bin/perl -w
>> use strict;
>> use Bio::DB::GenBank;
>> use Bio::DB::GenPept;
>> use Bio::DB::FileCache;
>> use Bio::Factory::FTLocationFactory;
>> use Bio::SeqFeature::Generic;
>>
>> my $ntdb = new Bio::DB::GenBank;
>> my $pepdb= new Bio::DB::GenPept;
>>
>> # do some caching in the event you're pulling up the same
>> # chromosome and/or you are debugging
>> my $cachent = new Bio::DB::FileCache(-kept => 1,
>> -file => '/tmp/cache/nt.idx',
>> -seqdb => $ntdb);
>>
>> my $cachepep = new Bio::DB::FileCache(-kept => 1,
>> -file => '/tmp/cache/pep.idx',
>> -seqdb => $pepdb);
>>
>> # obj to turn strings into Bio::Location object
>> my $locfactory = new Bio::Factory::FTLocationFactory;
>>
>> # you might get these from a file (and they can be accessions too)
>> my @protgis = (10956263);
>>
>> foreach my $gi ( @protgis ) {
>> my $protseq = $cachepep->get_Seq_by_id($gi);
>> if( ! $protseq ) { print STDERR "could not find a seq for
>> gi:$gi\n";
>> next;
>> }
>> foreach my $cds ( grep { $_->primary_tag eq 'CDS' }
>> $protseq->get_SeqFeatures() )
>> {
>> next unless( $cds->has_tag('coded_by') ); # skip CDSes with no
>> coded_by
>> my ($codedby) = $cds->each_tag_value('coded_by');
>> my ($ntacc,$loc) = split(/\:/, $codedby);
>> $ntacc =~ s/(\.\d+)//; # genbank wants an accession not a
>> versioned one
>> my $cdslocation = $locfactory->from_string($loc);
>> my $cdsfeature = new Bio::SeqFeature::Generic(-location =>
>> $cdslocation);
>> my $ntseq = $cachent->get_Seq_by_acc($ntacc);
>> next unless $ntseq;
>> $ntseq->add_SeqFeature($cdsfeature); # locate the feature on a
>> seq
>> my $cdsseq = $cdsfeature->spliced_seq();
>> print "cds seq is ", $cdsseq->seq(), "\n";
>> }
>> }
>>
>>
>>
>> ======================================================================
>> ==
>> ====
>> oOOOOo Marcelino Suzuki, Assistant
>> Professor
>> oOOO Chesapeake Biological Lab - Univ of
>> Maryland
>> Center Environm Science
>> oOOOOOo. PO Box 38, One Williams St Solomons, MD
>> 20688
>> .oOOOOOOOOOo. suzuki at cbl.umces.edu -
>> http://cbl.umces.edu
>> .oOOOOOOOOOOOOOOooo.. Ph 410-326-7291 FAX 410-326-7341
>> 0000000000000000000000000000000000000000000000000000000000000000000000
>> 00
>> 0000
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at portal.open-bio.org
>> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>>
>
> --
> Jason Stajich
> Duke University
> jason at cgt.mc.duke.edu
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>
========================================================================
====
oOOOOo Marcelino Suzuki, Assistant Professor
oOOO Chesapeake Biological Lab - Univ of Maryland
Center Environm Science
oOOOOOo. PO Box 38, One Williams St Solomons, MD 20688
.oOOOOOOOOOo. suzuki at cbl.umces.edu -
http://cbl.umces.edu
.oOOOOOOOOOOOOOOooo.. Ph 410-326-7291 FAX 410-326-7341
000000000000000000000000000000000000000000000000000000000000000000000000
0000
More information about the Bioperl-l
mailing list