[Bioperl-l] (no subject)

Fri Mar 2 13:47:26 UTC 2007

Thank you all for your advice. It certaintly made my weekend!
Indeed, I could run the example using the RefSeq accesion number. As
suggested earlier by Samuel, I run the command over RefSeq (get_Seq_by_id )
method and it worked even without taking out the version last numbers.
I am attaching the modified script I run (I checked the translated protein
also to verify I got the correct CDS)

use Bio::Factory::FTLocationFactory;
use Bio::DB::RefSeq;
use Bio::DB::GenBank;

my $gp = Bio::DB::RefSeq->new;

my $gb = Bio::DB::GenBank->new;
# factory to turn strings into Bio::Location objects

my $loc_factory = Bio::Factory::FTLocationFactory->new;
 open (IN,"refids.txt") or die "\n I can't open the file\n";
 open (OUT, ">>refseqfast.txt") or die "\n I can write it\n";
 while (<IN>) {
 chomp;

my $protein_acc = $_;

#print "que onda $protein_acc\n";

#die;
my $prot_obj = $gp->get_Seq_by_id($protein_acc);

foreach my $feat ( $prot_obj->top_SeqFeatures ) {

   if ( $feat->primary_tag eq 'CDS' ) {
   # example: 'coded_by="U05729.1:1..122"'

   my @coded_by = $feat->each_tag_value('coded_by');

   my ($nuc_acc,$loc_str) = split /\:/, $coded_by[0];
   print "  $nuc_acc\n";
 #   $nuc_acc = ~s/(\w+).\d+/\1/;

    print "  $nuc_acc\n";

   my $nuc_obj = $gb->get_Seq_by_id($nuc_acc);
   # create Bio::Location object from a string

   my $loc_object = $loc_factory->from_string($loc_str);
   # create a Feature object by using a Location

   my $feat_obj = Bio::SeqFeature::Generic->new(-location =>$loc_object);
   # associate the Feature object with the nucleotide Seq object

   $nuc_obj->add_SeqFeature($feat_obj);

    my $cds_obj = $feat_obj->spliced_seq;

    print OUT  ">",$nuc_acc,"\n",$cds_obj->seq,"\n";
   }
}
  }

On 02/03/07, Sendu Bala <bix at sendu.me.uk> wrote:
>
> Luba Pardo wrote:
> > Dear all,
> > Sorry if the questions is too basic but I am trying to learn BioPerl
> > modules. So I am trying to get the CDS sequence from a gi identification
> > protein using the "features" method. I started to run the example of the
> FAQ
> > doc (How do I retrieve a nucleotide coding sequence when I have a
> protein gi
> > number?)
>
> [
>
> http://www.bioperl.org/wiki/FAQ#How_do_I_retrieve_a_nucleotide_coding_sequence_when_I_have_a_protein_gi_number.3F
> ]
>
> [snip]
> > my $protein_gi = '405830';
> > my $prot_obj = $gp->get_Seq_by_id($protein_gi);;
> > foreach my $feat ( $prot_obj->top_SeqFeatures ) {
> >    if ( $feat->primary_tag eq 'CDS' ) {
> >    # example: 'coded_by="U05729.1:1..122"'
> >    my @coded_by = $feat->each_tag_value('coded_by');
> >    my ($nuc_acc, $loc_str) = split /\:/, $coded_by[0];
> [snip]
> > The error I got is
> >
> > ------------- EXCEPTION: Bio::Root::Exception -------------
> > MSG: Must specify a query or list of uids to fetch
> [snip]
> > But I can not see where part of the script is that I have to specify a
> list
> > of gi. That very odd. Am I interpreting the script wrong?
>
> If you use warnings you'd have seen a problem on the line with the
> split: @coded_by is empty. This is because you aren't supplying a
> protein GI. In this case it would be 405831, not 405830. 405830 is
> already the nucleotide GI so you don't need to do this stuff with
> coded_by. Use the code in the next section of the FAQ instead:
>
>
> http://www.bioperl.org/wiki/FAQ#How_do_I_get_the_complete_spliced_nucleotide_sequence_from_the_CDS_section.3F
>
>