[Bioperl-l] how to work on two txt files simultaneously by handle
corresponding lines from each file
Alex Zhang
mayagao1999 at yahoo.com
Mon Jul 18 17:06:10 EDT 2005
Dear All,
Sorry to bother you again.
I have two txt files to handle. One is
"short_sequences" and the other
one is "long_sequences". The "short_sequences" holds
100 short sequences (8 nucleotide long) and 100 long
sequences (200 nucleotide long) in the
"long_sequence".
For example, the first short sequence is "TTGACATA"
and the first long sequence is
"GAATCATATATTAGTCTCCACATACTCCGTTCGTGACCCATTACCCTTTCGGGAGA
GCCACAGCAACTGTAGATCTCGAAGTTGACAGGGGCAACTAGAGGCCTCAGAATTCT
CACTCTTGAGGAGAGAAGTCTAAGACCTACAGTATGGTCGGGTTAGTTTTTGTTCCGTC
GAACCTTGGACTAACCACTGTCTGGATA".
Basically, I want to generate a random position as a
starting site to replace a substring
in the long sequence with a short sequence. In this
example, we can choose a starting site
as 5th nucleotide in the long sequence, after
replacing using "TTGACATA", the replaced
long sequence is
"GAATTTGACATAAGTCTCCACATACTCCGTTCGTGACCCATTACCCTTTCGGGAGA
GCCACAGCAACTGTAGATCTCGAAGTTGACAGGGGCAACTAGAGGCCTCAGAATTCT
CACTCTTGAGGAGAGAAGTCTAAGACCTACAGTATGGTCGGGTTAGTTTTTGTTCCGTC
GAACCTTGGACTAACCACTGTCTGGATA".
Then I want replace the 2nd long sequence with the 2nd
short sequence and then repeat this over and over
again until the last long sequence is reached and
replaced. I think the only problem is that the
starting site should not be larger than 193.
Otherwise, there are
not enough nucleotides in the long sequence for
replacement.
Furthurmore, I want to keep track the starting
replacement site for each long sequence.
I am copying my code in the below.
******************************************
use strict;
use warnings;
my (@short, @long, $offset); # the 'short' array will
hold the short
#sequences while 'long'
array the long sequences
open(FILE1, '<', "short_sequences.txt") || die "Can't
open short_sequences.txt: $!\n";
while(<FILE1>){
chomp;
push(@short, $_);
}
close FILE1; #Close the file
open(FILE2, '<', "long_sequences.txt") || die "Can't
open long_sequences.txt: $!\n";
while(<FILE2>){
chomp;
push(@long, $_);
}
close FILE2; #Close the file
# replacement
foreach my $short(@short){
foreach my $long(@long){
$offset = int(rand(length($long)%193));
substr($long,$offset,length($short),$short);
printf "%3d", $offset+1;
print "\n", $long, "\n";
}
}
********************************************
But I just realized that there is a problem for the
two
loops. The problem is that each short sequence will be
used to replace all long sequences not the
corresponding one.
So I seek your suggestions on how to handle two files
simultaneously for my case.
Thank you very much and look forward to your reply!
Best Regards,
Alex
__________________________________________________
Do You Yahoo!?
Tired of spam? Yahoo! Mail has the best spam protection around
http://mail.yahoo.com
More information about the Bioperl-l
mailing list