[Bioperl-l] extracting CDS portion of RefSeqs

Scott Markel smarkel at scitegic.com
Wed Dec 14 12:31:02 EST 2005


Amit,

You can make a seq call directly on $feat.  This reduces
your code to the following.  Note that I also moved the
creation of $out outside the loop.

my $seqio = Bio::SeqIO->new(-file   => $ARGV[0],
		            -format => 'GenBank');

my $out = Bio::SeqIO->new(-file   => $ARGV[1],
                           -format => 'Fasta');

foreach my $feat ( $seq->get_SeqFeatures() ) {
              if( $feat->primary_tag eq 'CDS' ) {
		 my $seqobj = $feat->seq;
		 $out->write_seq($seqobj);
              }
          }

Scott

Amit Indap wrote:

> Hi,
> 
> I want to extract the CDS portion of human refseqs. I downloaded the
> genbank flat file of the most recent Refseq release. I was going to
> parse the Genbank file and write out the CDS porition of the sequence
> like so:
> 
> my $seqio = Bio::SeqIO->new(-file => $ARGV[0],
> 		      -format => 'GenBank');
> 
> 
> foreach my $feat ( $seq->get_SeqFeatures() ) {
>              if( $feat->primary_tag eq 'CDS' ) {
> 		 my $start = $feat->start;
> 		 my $end = $feat->end;
> 	 my $seqstr   = $seq->subseq($start,$end); #
> 		 my $displayid = $seq->display_name;
> 		 #my $seqobj = Bio::Seq->new( -display_id => "$displayid:$start..$end",
> 		#			     -seq => $seqstr);
> 		# my $out = Bio::SeqIO->new(-format => 'Fasta');
> 		# $out->write_seq($seqobj);
> 		
> 		
> #                 print STDOUT "Location ",$feat->start,":",
> #                    $feat->end," GFF[",$feat->gff_string,"]\n";
>              }
>          }
> --
> Amit Indap
> http://www.bscb.cornell.edu/Homepages/Amit_Indap/
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
> 
> 

-- 
Scott Markel, Ph.D.
Principal Bioinformatics Architect  email:  smarkel at scitegic.com
SciTegic Inc.                       mobile: +1 858 205 3653
9665 Chesapeake Drive, Suite 401    voice:  +1 858 279 8800, ext. 253
San Diego, CA 92123                 fax:    +1 858 279 8804
USA                                 web:    http://www.scitegic.com



More information about the Bioperl-l mailing list