[Bioperl-l] extracting CDS portion of RefSeqs
Amit Indap
indapa at gmail.com
Wed Dec 14 11:18:01 EST 2005
Sorry, I hit send before I finished my email
Anyways, I want to extract out the CDS portion of human refseqs. I
downloaded the most recent refseq release in genbank format. I was
extracting out the CDS portion this way:
foreach my $feat ( $seq->get_SeqFeatures() ) {
if( $feat->primary_tag eq 'CDS' ) {
my $start = $feat->start;
my $end = $feat->end;
my $seqstr = $seq->subseq($start,$end);
my $displayid = $seq->display_name;
my $seqobj = Bio::Seq->new( -display_id => "$displayid:$start..$end",
-seq => $seqstr);
my $out = Bio::SeqIO->new(-format => 'Fasta');
$out->write_seq($seqobj);
But this is quite slow since the refseq genbank file is quite large.
Is there anyway to download the CDS portion of refseq from NCBI? Is
there a quicker BioPerl solution than the one I have?
Thanks for your help.
Amit
--
Amit Indap
http://www.bscb.cornell.edu/Homepages/Amit_Indap/
More information about the Bioperl-l
mailing list