[Bioperl-l] Possible Repeat E-Mail
Warren Gallin
wgallin at ualberta.ca
Wed Apr 16 17:07:49 UTC 2014
OK, I misunderstood the purpose of passing the Bio::DB::GenBank handle to the spliced_sequence method - I thought that used the Accession Number in the CDS feature to go to the GENBANK nucleotide sequence file and grab the nucleic acid sequence.
So am I correct in thinking that the approach of passing a handle to GenBank to the spliced_sequence method only reaches through from one nucleotide sequence entry to another nucleotide sequence entry? Or am I sill missing the point?
Warren
On Apr 16, 2014, at 10:55 AM, Fields, Christopher J <cjfields at illinois.edu> wrote:
> Warren,
>
> Simple explanation. The sequence record you are using is a *protein* record; spliced_seq() works on the instance’s sequence (the protein sequence).
>
> You need the nucleotide sequence associated with this one, so you have to retrieve the nucleotide record associated with it, the one mentioned in the ‘CDS’ feature (the ‘coded_by’ tag) for the file you retrieved:
>
> CDS 1..1158
> /gene="KCNH2"
> /coded_by="NM_001193658.1:14..3490"
> /db_xref="GeneID:100064000"
>
> and then grab the relevant sequence from that.
>
> chris
>
>
> On Apr 16, 2014, at 12:37 AM, Warren Gallin <wgallin at ualberta.ca> wrote:
>
>> Jason,
>>
>> My previous message bounced, presumably because I included an attachment.
>>
>> On the chance that it did not make it through, here is the relevant test case:
>>
>> A script called Test_Script.pl is as follows:
>>
>> ____________________________________________________________________
>>
>> #!/usr/bin/perl
>>
>> use strict;
>> use warnings;
>> use DBI;
>> use Bio::Seq;
>> use Bio::DB::EUtilities;
>> use Bio::SeqIO;
>> use Bio::Seq;
>> use Data::Printer;
>> use Bio::DB::GenBank;
>>
>> my $gi = 302393575; #This gi number is for the protein record of a horse ion channel
>> my $spliced_cds;
>> my $na_seq;
>> my %na_vkcnt_id;
>>
>> #Create a database handle to GENBANK for retrieving coding sequences
>>
>> my $gb_db = Bio::DB::GenBank->new();
>>
>>
>> #create a structure for fetching protein records from GENBANK
>>
>> my $factory = Bio::DB::EUtilities->new(
>> -eutil => 'efetch',
>> -db => 'protein',
>> -rettype => 'gb',
>> -email => 'wgallin at ualberta.ca',
>> -id => $gi
>> );
>>
>> my $holding_file = 'protein_records.gb';
>>
>> $factory->get_Response( -file => $holding_file );
>>
>> my $seqin = Bio::SeqIO->new(
>> -file => $holding_file,
>> -format => 'genbank'
>> );
>>
>> while ( my $seq = $seqin->next_seq ) {
>> my $na_acc_gennt;
>> my $hit_gi = $seq->primary_id;
>>
>> for my $feature_object ( $seq->get_SeqFeatures ) {
>>
>> if ( $feature_object->primary_tag eq "CDS" ) {
>> $spliced_cds = $feature_object->spliced_seq($gb_db);
>> $na_seq = $spliced_cds->seq;
>>
>> print "UPDATE gennt SET cds = \"$na_seq\" ;\n";
>> }
>> }
>> }
>> exit;
>>
>> ________________________________________________________
>>
>> When I run it I get the following output:
>>
>>
>> _________________________________________________________
>>
>> warrenglinsmbp2:140414_Update_Flawed_gennt_Entries wgallin$ perl Test_Script.pl
>>
>> --------------------- WARNING ---------------------
>> MSG: API has changed; please use '-db' or '-nosort' for args. See POD for more details.
>> ---------------------------------------------------
>> UPDATE gennt SET cds = "MPVRRGHVAPQNTFLDTIIRKFEGQSRKFIIANARVENCAVIYCNDGFCELCGYSRAEVMQRPCTCDFLHGPRTQRRAAAQIAQALLGAEERKVEISFYRKDGSCFLCLVDVVPVKNEDGAVIMFILNFEVVMEKDMVGSPARDTNHRGPPTSWLATGRAKTFRLKLPALLALTARESTVRPGGAGSTGAPGAVVVDVDLTPAAPSSESLALDEVTAMDNHVAGLGPAEERRALVGPGSPPACAPIPHPSPRAHSLNPDASGSSCSLARTRSRESCASVRRASSADDIEAMRTGLPPPPRHASTGAMHPLRSGLLNSTSDSDLVRYRTISKIPQITLNFVDLKGDPFLASPTSDREIIAPKIKERTHNVTEKVTQVLSLGADVLPEYKLQAPRIHRWTILHYSPFKAVWDWLILLLVIYTAVFTPYSAAFLLKETEEGPPATDCGYACQPLAVVDLIVDIMFIVDILINFRTTYVNANEEVVSHPGRIAVHYFKGWFLIDMVAAIPFDLLIFGSGSEELIGLLKTARLLRLVRVARKLDRYSEYGAAVLFLLMCTFALIAHWLACIWYAIGNMEQPHMDSRIGWLHNLGDQIGKPYNSSGLGGPSIKDKYVTALYFTFSSLTSVGFGNVSPNTNSEKIFSICVMLIGSLMYASIFGNVSAIIQRLYSGTARYHTQMLRVREFIRFHQIPNPLRQRLEEYFQHAWSYTNGIDMNAVLKGFPECLQADICLHLNRSLLQHCKPFRGATKGCLRALAMKFKTTHAPPGDTLVHAGDLLTALYFISRGSIEILRGDVVVAILGKNDIFGEPLNLYARPGKSNGDVRALTYCDLHKIHRDDLLEVLDMYPEFSDHFWSSLEITFNLRDTNMIPGSPGSTELEGGFNRQRKRKLSFRRRTDKDPEQPGEVSALGPGRAGAGPSSRGRPGGPWGESPSSGPSSPESSEDEGPGRSSSPLRLVPFSSPRPPGE!
>> PPGGEPLIEDCEKSSDTCNPLSGAFSGVSNIFSFWGDSRGRQYQELPRCPAPAPSLLNIPLSSPGRRPRGDVESRLDALQRQLNRLETRLSADMATVLQLLQRQMTLVPPAYSAVTTPGPGPTSTSPLLPVSPIPTLTLDSLSQVSQFMACEELPPGAPELPQDGPTRRLSLPGQLGALTSQPLHRHGSDPGS" ;
>> warrenglinsmbp2:140414_Update_Flawed_gennt_Entries wgallin$
>>
>>
>> ___________________________________________________________
>>
>> The problem is that the sequence is being returned as translated CDS rather than the nucleotide sequence of the CDS.
>>
>> This happens with every gi number that I have tried.
>>
>> I fell like I am missing some subtle BioPerl of wisdom, but I can not figure out what that is.
>>
>> Thanks for looking at this.
>>
>> Warren Gallin
>>
>>
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
More information about the Bioperl-l
mailing list