[Bioperl-l] extracting CDS portion of RefSeqs
Scott Markel
smarkel at scitegic.com
Wed Dec 14 12:31:02 EST 2005
Amit,
You can make a seq call directly on $feat. This reduces
your code to the following. Note that I also moved the
creation of $out outside the loop.
my $seqio = Bio::SeqIO->new(-file => $ARGV[0],
-format => 'GenBank');
my $out = Bio::SeqIO->new(-file => $ARGV[1],
-format => 'Fasta');
foreach my $feat ( $seq->get_SeqFeatures() ) {
if( $feat->primary_tag eq 'CDS' ) {
my $seqobj = $feat->seq;
$out->write_seq($seqobj);
}
}
Scott
Amit Indap wrote:
> Hi,
>
> I want to extract the CDS portion of human refseqs. I downloaded the
> genbank flat file of the most recent Refseq release. I was going to
> parse the Genbank file and write out the CDS porition of the sequence
> like so:
>
> my $seqio = Bio::SeqIO->new(-file => $ARGV[0],
> -format => 'GenBank');
>
>
> foreach my $feat ( $seq->get_SeqFeatures() ) {
> if( $feat->primary_tag eq 'CDS' ) {
> my $start = $feat->start;
> my $end = $feat->end;
> my $seqstr = $seq->subseq($start,$end); #
> my $displayid = $seq->display_name;
> #my $seqobj = Bio::Seq->new( -display_id => "$displayid:$start..$end",
> # -seq => $seqstr);
> # my $out = Bio::SeqIO->new(-format => 'Fasta');
> # $out->write_seq($seqobj);
>
>
> # print STDOUT "Location ",$feat->start,":",
> # $feat->end," GFF[",$feat->gff_string,"]\n";
> }
> }
> --
> Amit Indap
> http://www.bscb.cornell.edu/Homepages/Amit_Indap/
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>
>
--
Scott Markel, Ph.D.
Principal Bioinformatics Architect email: smarkel at scitegic.com
SciTegic Inc. mobile: +1 858 205 3653
9665 Chesapeake Drive, Suite 401 voice: +1 858 279 8800, ext. 253
San Diego, CA 92123 fax: +1 858 279 8804
USA web: http://www.scitegic.com
More information about the Bioperl-l
mailing list