[Bioperl-l] (no subject)
Luba Pardo
lubapardo at gmail.com
Fri Mar 2 13:47:26 UTC 2007
Thank you all for your advice. It certaintly made my weekend!
Indeed, I could run the example using the RefSeq accesion number. As
suggested earlier by Samuel, I run the command over RefSeq (get_Seq_by_id )
method and it worked even without taking out the version last numbers.
I am attaching the modified script I run (I checked the translated protein
also to verify I got the correct CDS)
use Bio::Factory::FTLocationFactory;
use Bio::DB::RefSeq;
use Bio::DB::GenBank;
my $gp = Bio::DB::RefSeq->new;
my $gb = Bio::DB::GenBank->new;
# factory to turn strings into Bio::Location objects
my $loc_factory = Bio::Factory::FTLocationFactory->new;
open (IN,"refids.txt") or die "\n I can't open the file\n";
open (OUT, ">>refseqfast.txt") or die "\n I can write it\n";
while (<IN>) {
chomp;
my $protein_acc = $_;
#print "que onda $protein_acc\n";
#die;
my $prot_obj = $gp->get_Seq_by_id($protein_acc);
foreach my $feat ( $prot_obj->top_SeqFeatures ) {
if ( $feat->primary_tag eq 'CDS' ) {
# example: 'coded_by="U05729.1:1..122"'
my @coded_by = $feat->each_tag_value('coded_by');
my ($nuc_acc,$loc_str) = split /\:/, $coded_by[0];
print " $nuc_acc\n";
# $nuc_acc = ~s/(\w+).\d+/\1/;
print " $nuc_acc\n";
my $nuc_obj = $gb->get_Seq_by_id($nuc_acc);
# create Bio::Location object from a string
my $loc_object = $loc_factory->from_string($loc_str);
# create a Feature object by using a Location
my $feat_obj = Bio::SeqFeature::Generic->new(-location =>$loc_object);
# associate the Feature object with the nucleotide Seq object
$nuc_obj->add_SeqFeature($feat_obj);
my $cds_obj = $feat_obj->spliced_seq;
print OUT ">",$nuc_acc,"\n",$cds_obj->seq,"\n";
}
}
}
On 02/03/07, Sendu Bala <bix at sendu.me.uk> wrote:
>
> Luba Pardo wrote:
> > Dear all,
> > Sorry if the questions is too basic but I am trying to learn BioPerl
> > modules. So I am trying to get the CDS sequence from a gi identification
> > protein using the "features" method. I started to run the example of the
> FAQ
> > doc (How do I retrieve a nucleotide coding sequence when I have a
> protein gi
> > number?)
>
> [
>
> http://www.bioperl.org/wiki/FAQ#How_do_I_retrieve_a_nucleotide_coding_sequence_when_I_have_a_protein_gi_number.3F
> ]
>
> [snip]
> > my $protein_gi = '405830';
> > my $prot_obj = $gp->get_Seq_by_id($protein_gi);;
> > foreach my $feat ( $prot_obj->top_SeqFeatures ) {
> > if ( $feat->primary_tag eq 'CDS' ) {
> > # example: 'coded_by="U05729.1:1..122"'
> > my @coded_by = $feat->each_tag_value('coded_by');
> > my ($nuc_acc, $loc_str) = split /\:/, $coded_by[0];
> [snip]
> > The error I got is
> >
> > ------------- EXCEPTION: Bio::Root::Exception -------------
> > MSG: Must specify a query or list of uids to fetch
> [snip]
> > But I can not see where part of the script is that I have to specify a
> list
> > of gi. That very odd. Am I interpreting the script wrong?
>
> If you use warnings you'd have seen a problem on the line with the
> split: @coded_by is empty. This is because you aren't supplying a
> protein GI. In this case it would be 405831, not 405830. 405830 is
> already the nucleotide GI so you don't need to do this stuff with
> coded_by. Use the code in the next section of the FAQ instead:
>
>
> http://www.bioperl.org/wiki/FAQ#How_do_I_get_the_complete_spliced_nucleotide_sequence_from_the_CDS_section.3F
>
>
More information about the Bioperl-l
mailing list