[Bioperl-l] Possible Repeat E-Mail

Wed Apr 16 18:38:26 UTC 2014

Based on the docs:

https://metacpan.org/pod/Bio::SeqFeatureI#spliced_seq

spliced_seq() would only use the DB handle if you had a split location and if part of the location is remote (neither of which you have).  This is designed for features like those spanning different contigs in an assembly, where one portion of the sequence is on a ‘remote’ fragment, like 'join(AB000123:5567-5589,80..1144)’.  Note the first segment in that example is located on the sequence AB000123, so spliced_seq() would retrieve that sequence from the remote database handle, grab the fragment, and splice it to the local segment.

The location of the feature you have is:

     CDS             1..1158

It’s a simple location (no split), so it simply returns what you already have.

You could probably have a look in the guts of spliced_seq() to see what it’s doing; I think you could pull out something that works for you.

chris

On Apr 16, 2014, at 12:07 PM, Warren Gallin <wgallin at ualberta.ca> wrote:

> OK, I misunderstood the purpose of passing the Bio::DB::GenBank handle to the spliced_sequence method - I thought that used the Accession Number in the CDS feature to go to the GENBANK nucleotide sequence file and grab the nucleic acid sequence.
> 
> 
> So am I correct in thinking that the approach of passing a handle to GenBank to the spliced_sequence method only reaches through from one nucleotide sequence entry to another nucleotide sequence entry?  Or am I sill missing the point?
> 
> 
> Warren
> 
> On Apr 16, 2014, at 10:55 AM, Fields, Christopher J <cjfields at illinois.edu> wrote:
> 
>> Warren, 
>> 
>> Simple explanation.  The sequence record you are using is a *protein* record; spliced_seq() works on the instance’s sequence (the protein sequence).  
>> 
>> You need the nucleotide sequence associated with this one, so you have to retrieve the nucleotide record associated with it, the one mentioned in the ‘CDS’ feature (the ‘coded_by’ tag) for the file you retrieved:
>> 
>>    CDS             1..1158
>>                    /gene="KCNH2"
>>                    /coded_by="NM_001193658.1:14..3490"
>>                    /db_xref="GeneID:100064000"
>> 
>> and then grab the relevant sequence from that.
>> 
>> chris
>> 
>> 
>> On Apr 16, 2014, at 12:37 AM, Warren Gallin <wgallin at ualberta.ca> wrote:
>> 
>>> Jason,
>>> 
>>> 	My previous message bounced, presumably because I included an attachment.
>>> 
>>> 	On the chance that it did not make it through, here is the relevant test case:
>>> 
>>> A script called Test_Script.pl is as follows:
>>> 
>>> ____________________________________________________________________
>>> 
>>> #!/usr/bin/perl 
>>> 
>>> use strict;
>>> use warnings;
>>> use DBI;
>>> use Bio::Seq;
>>> use Bio::DB::EUtilities;
>>> use Bio::SeqIO;
>>> use Bio::Seq;
>>> use Data::Printer;
>>> use Bio::DB::GenBank;
>>> 
>>> my $gi = 302393575;  #This gi number is for the protein record of a horse ion channel
>>> my $spliced_cds;
>>> my $na_seq;
>>> my %na_vkcnt_id;
>>> 
>>> #Create a database handle to GENBANK for retrieving coding sequences
>>> 
>>> my $gb_db = Bio::DB::GenBank->new();
>>> 
>>> 
>>> #create a structure for fetching protein records from GENBANK
>>> 
>>> my $factory = Bio::DB::EUtilities->new(
>>>  -eutil   => 'efetch',
>>>  -db      => 'protein',
>>>  -rettype => 'gb',
>>>  -email   => 'wgallin at ualberta.ca',
>>>  -id      => $gi
>>> );
>>> 
>>> my $holding_file = 'protein_records.gb';
>>> 
>>> $factory->get_Response( -file => $holding_file );
>>> 
>>> my $seqin = Bio::SeqIO->new(
>>>  -file   => $holding_file,
>>>  -format => 'genbank'
>>> );
>>> 
>>> while ( my $seq = $seqin->next_seq ) {
>>>  my $na_acc_gennt;
>>>  my $hit_gi = $seq->primary_id;
>>> 
>>>  for my $feature_object ( $seq->get_SeqFeatures ) {
>>> 
>>>      if ( $feature_object->primary_tag eq "CDS" ) {
>>>          $spliced_cds = $feature_object->spliced_seq($gb_db);
>>>          $na_seq      = $spliced_cds->seq;
>>> 
>>>      print "UPDATE gennt SET cds = \"$na_seq\" ;\n";
>>> 		}
>>>  }
>>> }
>>> exit;
>>> 
>>> ________________________________________________________
>>> 
>>> When I run it I get the following output:
>>> 
>>> 
>>> _________________________________________________________
>>> 
>>> warrenglinsmbp2:140414_Update_Flawed_gennt_Entries wgallin$ perl Test_Script.pl
>>> 
>>> --------------------- WARNING ---------------------
>>> MSG: API has changed; please use '-db' or '-nosort' for args. See POD for more details.
>>> ---------------------------------------------------
>>> UPDATE gennt SET cds = "MPVRRGHVAPQNTFLDTIIRKFEGQSRKFIIANARVENCAVIYCNDGFCELCGYSRAEVMQRPCTCDFLHGPRTQRRAAAQIAQALLGAEERKVEISFYRKDGSCFLCLVDVVPVKNEDGAVIMFILNFEVVMEKDMVGSPARDTNHRGPPTSWLATGRAKTFRLKLPALLALTARESTVRPGGAGSTGAPGAVVVDVDLTPAAPSSESLALDEVTAMDNHVAGLGPAEERRALVGPGSPPACAPIPHPSPRAHSLNPDASGSSCSLARTRSRESCASVRRASSADDIEAMRTGLPPPPRHASTGAMHPLRSGLLNSTSDSDLVRYRTISKIPQITLNFVDLKGDPFLASPTSDREIIAPKIKERTHNVTEKVTQVLSLGADVLPEYKLQAPRIHRWTILHYSPFKAVWDWLILLLVIYTAVFTPYSAAFLLKETEEGPPATDCGYACQPLAVVDLIVDIMFIVDILINFRTTYVNANEEVVSHPGRIAVHYFKGWFLIDMVAAIPFDLLIFGSGSEELIGLLKTARLLRLVRVARKLDRYSEYGAAVLFLLMCTFALIAHWLACIWYAIGNMEQPHMDSRIGWLHNLGDQIGKPYNSSGLGGPSIKDKYVTALYFTFSSLTSVGFGNVSPNTNSEKIFSICVMLIGSLMYASIFGNVSAIIQRLYSGTARYHTQMLRVREFIRFHQIPNPLRQRLEEYFQHAWSYTNGIDMNAVLKGFPECLQADICLHLNRSLLQHCKPFRGATKGCLRALAMKFKTTHAPPGDTLVHAGDLLTALYFISRGSIEILRGDVVVAILGKNDIFGEPLNLYARPGKSNGDVRALTYCDLHKIHRDDLLEVLDMYPEFSDHFWSSLEITFNLRDTNMIPGSPGSTELEGGFNRQRKRKLSFRRRTDKDPEQPGEVSALGPGRAGAGPSSRGRPGGPWGESPSSGPSSPESSEDEGPGRSSSPLRLVPFSSPRPPGE!
>>> PPGGEPLIEDCEKSSDTCNPLSGAFSGVSNIFSFWGDSRGRQYQELPRCPAPAPSLLNIPLSSPGRRPRGDVESRLDALQRQLNRLETRLSADMATVLQLLQRQMTLVPPAYSAVTTPGPGPTSTSPLLPVSPIPTLTLDSLSQVSQFMACEELPPGAPELPQDGPTRRLSLPGQLGALTSQPLHRHGSDPGS" ;
>>> warrenglinsmbp2:140414_Update_Flawed_gennt_Entries wgallin$ 
>>> 
>>> 
>>> ___________________________________________________________
>>> 
>>> The problem is that the sequence is being returned as translated CDS rather than the nucleotide sequence of the CDS.
>>> 
>>> This happens with every gi number that I have tried.
>>> 
>>> I fell like I am missing some subtle BioPerl of wisdom, but I can not figure out what that is.
>>> 
>>> Thanks for looking at this.
>>> 
>>> Warren Gallin
>>> 
>>> 
>>> 
>>> 
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> 
>