[Bioperl-l] help extracting CDS

Jason Stajich jason@cgt.mc.duke.edu
Mon, 16 Dec 2002 16:08:56 -0500 (EST)


Here is how I do it - this is how to do it with soon to be
released Bioperl 1.2.

1.0.2 doesn't have the get_SeqFeatures API or the $feature->spliced_seq
yet, it is a bit more pain-in-the-neck to do all that implementation in
1.0.2 (it's hidden in spliced_seq) so I don't know what to tell you other
than grab the latest from CVS and roll with it -- all tests pass on it
reportedly.

Assuming all CDS annotations are for the sequence conatined in the file
and aren't 'remote locations'.  If there are remote locations you need to
instantiate a Bio::DB::GenBank and pass it in to spliced_seq.

This is now a script in scripts/seq/extract_cds.pl

#!/usr/bin/perl -w
# Contributed by Jason Stajich <jason@bioperl.org>

# simple extract the CDS features from a genbank file and
# write out the CDS and Peptide sequences

use strict;
use Bio::SeqIO;
my $filename = shift || die("pass in a genbank filename on the cmd line");
my $in = new Bio::SeqIO(-file => $filename, -format => 'genbank');
my $out = new Bio::SeqIO(-file => ">$filename.cds");
my $outpep = new Bio::SeqIO(-file => ">$filename.pep");

while( my $seq = $in->next_seq ) {
  my @cds = grep { $_->primary_tag eq 'CDS' } $seq->get_SeqFeatures();
  foreach my $feature ( @cds ) {
    my $featureseq = $feature->spliced_seq;
    $out->write_seq($featureseq);
    $outpep->write_seq($featureseq->translate);
  }
}

Cheers,
-jason
On Mon, 16 Dec 2002, Pedro Antonio Reche wrote:

> Hi, I need to extract the CDS from a genbank genome record, saving them
> into file in  fasta format, and I wonder if someone can let me know how
> to do this using bioperl.
> Tanks in advance for any positive consideration.
>
> pedro
>
> *******************************************************************
> PEDRO A. RECHE , pHD		TL: 617 632 3824
> Dana-Farber Cancer Institute,	FX: 617 632 4569
> Harvard Medical School,		EM: reche@research.dfci.harvard.edu
> 44 Binney Street, D1510A,	EM: reche@mifoundation.org
> Boston, MA 02115		URL: http://www.reche.org
> *******************************************************************
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@bioperl.org
> http://bioperl.org/mailman/listinfo/bioperl-l
>

--
Jason Stajich
Duke University
jason at cgt.mc.duke.edu