[Bioperl-l] extracting CDS portion of RefSeqs

Amit Indap indapa at gmail.com
Wed Dec 14 11:13:31 EST 2005


Hi,

I want to extract the CDS portion of human refseqs. I downloaded the
genbank flat file of the most recent Refseq release. I was going to
parse the Genbank file and write out the CDS porition of the sequence
like so:

my $seqio = Bio::SeqIO->new(-file => $ARGV[0],
		      -format => 'GenBank');


foreach my $feat ( $seq->get_SeqFeatures() ) {
             if( $feat->primary_tag eq 'CDS' ) {
		 my $start = $feat->start;
		 my $end = $feat->end;
	 my $seqstr   = $seq->subseq($start,$end); #
		 my $displayid = $seq->display_name;
		 #my $seqobj = Bio::Seq->new( -display_id => "$displayid:$start..$end",
		#			     -seq => $seqstr);
		# my $out = Bio::SeqIO->new(-format => 'Fasta');
		# $out->write_seq($seqobj);
		
		
#                 print STDOUT "Location ",$feat->start,":",
#                    $feat->end," GFF[",$feat->gff_string,"]\n";
             }
         }
--
Amit Indap
http://www.bscb.cornell.edu/Homepages/Amit_Indap/



More information about the Bioperl-l mailing list