[Bioperl-l] extract ncDNA
cjfields at uiuc.edu
Sun Feb 26 14:12:57 UTC 2006
You're not using bioperl. See:
then go to:
On Feb 26, 2006, at 5:51 AM, perlmails at gmail.com wrote:
> Dear Bioperl group,
> I have been working on extracting non-coding DNA (ncDNA) sequences
> from an organimsm.
> I tried extracting the intergenic sequences from the sense-strand
> after filtering the features (CDS, gene, mRNA, tRNA, rRNA etc) from
> the EMBL feature table entries using the Bioperl and the additional
> script (mentioned below).
> Now, I realised that there is a problem to extract the ncDNA sequences
> from the negative-strand, Any ideas?
> To extract the ncDNAs from negative-strand, I thought of converting
> the negative-strand co-ordinates to sense-strand co-ordinates and
> adding these to the sense-strand cords. Then filter all the features
> (select the ncDNAs after discarding the features from EMBL FT) to get
> all the ncDNAs.
> Is there anything I am missing for using from the bioperl kit?
> ##<<<code start>>
> use strict;
> my $EMBL_cord_file = "Organism.feature.cords"; # feature
> co-ordinates: start \t end
> my $RAW_file = "Organism.raw";
> my $ncDNA_file = "Organism.ncDNA";
> open(EMBLCORD, $EMBL_cord_file) or die "Canot open EMBL_cord_file";
> open(RAW, $RAW_file) or die "Canot open RAW_file";
> open(OUT, ">$ncDNA_file") or die;
> my @dna=<RAW>;
> my $dna = join('', at dna);
> my @cords = split /\t/;
> my $start = $cords;
> my $end = $cords;
> my $replaceString = "\n>$start..$end";
> substr($dna, $start-1, $end-$start+1, $replaceString);
> print OUT $dna,"\n";
> ##<<<code end>>
> Another thing is, since I am reading the whole file in a scalar the
> script does not complete the extraction of all ncDNAs from the
> sense-strand. Obviously, the features are parsed first before the
> flattening of the 266,000 nt sequence into a single string.
> Any help would be appreciated.
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign
More information about the Bioperl-l