[Bioperl-l] Possible Repeat E-Mail

Wed Apr 16 17:07:49 UTC 2014

OK, I misunderstood the purpose of passing the Bio::DB::GenBank handle to the spliced_sequence method - I thought that used the Accession Number in the CDS feature to go to the GENBANK nucleotide sequence file and grab the nucleic acid sequence.

So am I correct in thinking that the approach of passing a handle to GenBank to the spliced_sequence method only reaches through from one nucleotide sequence entry to another nucleotide sequence entry?  Or am I sill missing the point?

Warren

On Apr 16, 2014, at 10:55 AM, Fields, Christopher J <cjfields at illinois.edu> wrote:

> Warren, 
> 
> Simple explanation.  The sequence record you are using is a *protein* record; spliced_seq() works on the instance’s sequence (the protein sequence).  
> 
> You need the nucleotide sequence associated with this one, so you have to retrieve the nucleotide record associated with it, the one mentioned in the ‘CDS’ feature (the ‘coded_by’ tag) for the file you retrieved:
> 
>     CDS             1..1158
>                     /gene="KCNH2"
>                     /coded_by="NM_001193658.1:14..3490"
>                     /db_xref="GeneID:100064000"
> 
> and then grab the relevant sequence from that.
> 
> chris
> 
> 
> On Apr 16, 2014, at 12:37 AM, Warren Gallin <wgallin at ualberta.ca> wrote:
> 
>> Jason,
>> 
>> 	My previous message bounced, presumably because I included an attachment.
>> 
>> 	On the chance that it did not make it through, here is the relevant test case:
>> 
>> A script called Test_Script.pl is as follows:
>> 
>> ____________________________________________________________________
>> 
>> #!/usr/bin/perl 
>> 
>> use strict;
>> use warnings;
>> use DBI;
>> use Bio::Seq;
>> use Bio::DB::EUtilities;
>> use Bio::SeqIO;
>> use Bio::Seq;
>> use Data::Printer;
>> use Bio::DB::GenBank;
>> 
>> my $gi = 302393575;  #This gi number is for the protein record of a horse ion channel
>> my $spliced_cds;
>> my $na_seq;
>> my %na_vkcnt_id;
>> 
>> #Create a database handle to GENBANK for retrieving coding sequences
>> 
>> my $gb_db = Bio::DB::GenBank->new();
>> 
>> 
>> #create a structure for fetching protein records from GENBANK
>> 
>> my $factory = Bio::DB::EUtilities->new(
>>   -eutil   => 'efetch',
>>   -db      => 'protein',
>>   -rettype => 'gb',
>>   -email   => 'wgallin at ualberta.ca',
>>   -id      => $gi
>> );
>> 
>> my $holding_file = 'protein_records.gb';
>> 
>> $factory->get_Response( -file => $holding_file );
>> 
>> my $seqin = Bio::SeqIO->new(
>>   -file   => $holding_file,
>>   -format => 'genbank'
>> );
>> 
>> while ( my $seq = $seqin->next_seq ) {
>>   my $na_acc_gennt;
>>   my $hit_gi = $seq->primary_id;
>> 
>>   for my $feature_object ( $seq->get_SeqFeatures ) {
>> 
>>       if ( $feature_object->primary_tag eq "CDS" ) {
>>           $spliced_cds = $feature_object->spliced_seq($gb_db);
>>           $na_seq      = $spliced_cds->seq;
>> 
>>       print "UPDATE gennt SET cds = \"$na_seq\" ;\n";
>> 		}
>>   }
>> }
>> exit;
>> 
>> ________________________________________________________
>> 
>> When I run it I get the following output:
>> 
>> 
>> _________________________________________________________
>> 
>> warrenglinsmbp2:140414_Update_Flawed_gennt_Entries wgallin$ perl Test_Script.pl
>> 
>> --------------------- WARNING ---------------------
>> MSG: API has changed; please use '-db' or '-nosort' for args. See POD for more details.
>> ---------------------------------------------------
>> UPDATE gennt SET cds = "MPVRRGHVAPQNTFLDTIIRKFEGQSRKFIIANARVENCAVIYCNDGFCELCGYSRAEVMQRPCTCDFLHGPRTQRRAAAQIAQALLGAEERKVEISFYRKDGSCFLCLVDVVPVKNEDGAVIMFILNFEVVMEKDMVGSPARDTNHRGPPTSWLATGRAKTFRLKLPALLALTARESTVRPGGAGSTGAPGAVVVDVDLTPAAPSSESLALDEVTAMDNHVAGLGPAEERRALVGPGSPPACAPIPHPSPRAHSLNPDASGSSCSLARTRSRESCASVRRASSADDIEAMRTGLPPPPRHASTGAMHPLRSGLLNSTSDSDLVRYRTISKIPQITLNFVDLKGDPFLASPTSDREIIAPKIKERTHNVTEKVTQVLSLGADVLPEYKLQAPRIHRWTILHYSPFKAVWDWLILLLVIYTAVFTPYSAAFLLKETEEGPPATDCGYACQPLAVVDLIVDIMFIVDILINFRTTYVNANEEVVSHPGRIAVHYFKGWFLIDMVAAIPFDLLIFGSGSEELIGLLKTARLLRLVRVARKLDRYSEYGAAVLFLLMCTFALIAHWLACIWYAIGNMEQPHMDSRIGWLHNLGDQIGKPYNSSGLGGPSIKDKYVTALYFTFSSLTSVGFGNVSPNTNSEKIFSICVMLIGSLMYASIFGNVSAIIQRLYSGTARYHTQMLRVREFIRFHQIPNPLRQRLEEYFQHAWSYTNGIDMNAVLKGFPECLQADICLHLNRSLLQHCKPFRGATKGCLRALAMKFKTTHAPPGDTLVHAGDLLTALYFISRGSIEILRGDVVVAILGKNDIFGEPLNLYARPGKSNGDVRALTYCDLHKIHRDDLLEVLDMYPEFSDHFWSSLEITFNLRDTNMIPGSPGSTELEGGFNRQRKRKLSFRRRTDKDPEQPGEVSALGPGRAGAGPSSRGRPGGPWGESPSSGPSSPESSEDEGPGRSSSPLRLVPFSSPRPPGE!
>> PPGGEPLIEDCEKSSDTCNPLSGAFSGVSNIFSFWGDSRGRQYQELPRCPAPAPSLLNIPLSSPGRRPRGDVESRLDALQRQLNRLETRLSADMATVLQLLQRQMTLVPPAYSAVTTPGPGPTSTSPLLPVSPIPTLTLDSLSQVSQFMACEELPPGAPELPQDGPTRRLSLPGQLGALTSQPLHRHGSDPGS" ;
>> warrenglinsmbp2:140414_Update_Flawed_gennt_Entries wgallin$ 
>> 
>> 
>> ___________________________________________________________
>> 
>> The problem is that the sequence is being returned as translated CDS rather than the nucleotide sequence of the CDS.
>> 
>> This happens with every gi number that I have tried.
>> 
>> I fell like I am missing some subtle BioPerl of wisdom, but I can not figure out what that is.
>> 
>> Thanks for looking at this.
>> 
>> Warren Gallin
>> 
>> 
>> 
>> 
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>