[Bioperl-l] Another problem with a joined feature

Fri Apr 18 00:55:29 UTC 2014

The problem is with NCBI here -

if you write back out the sequence you see the 1st exon is the only
location for the feature.

   CDS             226..611
                     /db_xref="GI:4028015"
                     /codon_start=1
                     /protein_id="AAC96101.1"
                     /gene="KCNQ3"

If you turn on verbose and use the tempfile method

my $gb_db = Bio::DB::GenBank->new( -db => 'genbank',
   -retrievaltype => 'tempfile',
   -verbose => 1,
   -rettype => 'gbwithparts' );

you will see the raw data that is being passed into SeqIO only has the
single location.

Also you can see the raw file here:
GET '
http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?retmode=text&rettype=gbwithparts&db=nucleotide&tool=bioperl&id=AF071478.1&usehistory=n
'

I am not sure right now how to tell NCBI to give you the remote features
locations in the file it downloads but that is the problem not with
spliced_seq.

Jason

Jason Stajich
jason at bioperl.org
http://bioperl.org/wiki/User:Jason
http://twitter.com/hyphaltip

On Thu, Apr 17, 2014 at 2:26 PM, Warren Gallin <wgallin at ualberta.ca> wrote:

> I have encountered another problem with several GenBank nucleotide
> sequence records.
>
> The exemplar is accession number AF071478.1 which is a sequence of a
> specific human exon, annotated with a CDS feature that describes the whole
> protein sequence  of which this exon anodes only a part.
>
> The CDS feature is described as a join of pieces of sequence from this
> record and several other records.  However, when I try to obtain the full
> css I am only getting the part of the sequence from the AF071478.1 record,
> not the others.
>
> The following is a minimal script that illustrates this behaviour.
>
> My understanding was that if a feature was described as a join of multiple
> nucleotide sequence records then the spliced sequence() method with a
> handle to a GenBank created using Bio::DB::GenBank should also collect and
> splice in the fragments of nucleotide sequence from other GenBank records
> to create the final spliced sequence.  Could someone explain where my
> understanding is going awry?
>
> Once again, thanks for any help.
>
> Warren Gallin
>
> ______________________________________________________________________
>
> #!/usr/bin/perl
>
> use strict;
> use warnings;
> use DBI;
> use Bio::Seq;
> use Bio::DB::EUtilities;
> use Bio::SeqIO;
> use Bio::Seq;
> use Data::Printer;
> use Bio::DB::GenBank;
>
> #Test script for collecting full nucleic acid sequences from a GenBank
> nucleotide
> #sequence record that has a feature annotated as joins between parts of a
> number of
> #individual nucleotide records.
>
> my @coded_by = ();
> #Single example of a nucleotide sequence entry with a CDS feature that
> includes joining
> #with the sequences from several other records.
>
> my $na_acc = 'AF071478.1';
>
> #The corresponding protein record for the targetted CDS sequence
>
> my $hit_acc = 'AAC96101.1';
>
> #Create a database handle to GENBANK for retrieving coding sequences,
> using the gbwithparts
> #as return type to accomodate large records if they occur.
>
> my $gb_db = Bio::DB::GenBank->new( -db => 'genbank', -rettype =>
> 'gbwithparts' );
>
> #Retrieve the genbank record for the test entry
>
> my $temp_nuc_record = $gb_db->get_Seq_by_acc($na_acc);
>
> foreach my $feat ( $temp_nuc_record->top_SeqFeatures ) {
>     if ( $feat->primary_tag eq 'CDS' ) {
>         eval {
>                 @coded_by = $feat->each_tag_value('protein_id');
>         };
>         if ($@) {
>             print
> "Could not find any protein_id tags for $hit_acc nucleotide $na_acc.\n
> $@\n";
>             exit;
>         }
>         if ( $coded_by[0] eq $hit_acc ) {
>             my $na_seq = $feat->spliced_seq( -db => $gb_db )->seq;
>             print "$na_seq\n";
>
>         }
>     }
> }
> exit;
>
>
> ______________________________________________________________________________
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>