[Bioperl-l] Still trouble with remote joined records

Thu Oct 18 01:42:54 UTC 2007

I must be missing something, but I can not get the procedure outlined  
in FAQ 5.5 to do what I think it should (maybe my expectations are  
incorrect.

I think I have an up-to-date release:

GallinPowerbook:~ wgallin$ perl -MBio::Root::Version -e 'print  
$Bio::Root::Version::VERSION,"\n"'
1.005002102

If I run the following script (not that gi7648673 is a protein record):
___________________________________________________________
#! /usr/bin/perl

#071017 - Small script to test the method for getting the ORF sequence
#using dbh passing into the sequence method.

use strict;
use warnings;
use LWP::Simple;
use Bio::DB::GenPept;
use Bio::DB::GenBank;
use Bio::Seq;

#Create handles to GenBank and GenPept

my $gbh = Bio::DB::GenBank -> new;
my $gph = Bio::DB::GenPept -> new;

#This is the gi number of the entry being tested

my $test_gi = 7648673;

#Get the protein sequence file from GENBANK

my $test_na_record = $gph -> get_Seq_by_gi ($test_gi);

#Go through the features - when the CDS feature is found splice  
together the sequence and translate.

   for my $feat ( $test_na_record->get_SeqFeatures ) {
     if ( $feat->primary_tag eq 'CDS' ) {
        my $cds = $feat->spliced_seq(-db => $gbh, -nosort => 1);
        print $cds->translate->seq, "\n";
     }
   }

exit;
_______________________________________________________________________
I get an error that a protein sequence can not be translated.  I  
thought that by feeding the handle to GenBank into the spliced_seq  
method that it would retrieve the necessary nucleic acid sequence  
records and splice together the specified ranges.

So I tried using the corresponding nucleic acid record, gi7648671,  
which holds the 5' end of the CDS ( I used $gbh on the get_Seq step).

That yielded the correct amino acid sequence for the first half the  
protein, encoded by the sequence in the record itself, but it did not  
retrieve the other nucleic acid record that is specified to contain  
the 3' end of the sequence.

So, as far as I can see, passing the DB handle isn't causing the  
spliced_seq method to go elsewhere for the nucleic acid sequence data.

I thought that was the purpose.

Can anyone enlighten me, or is this a bug?

Warren Gallin