[Bioperl-l] how to work on two txt files simultaneously by handle
corresponding lines from each file
khoueiry
khoueiry at ibdm.univ-mrs.fr
Mon Jul 18 17:19:47 EDT 2005
If I understood well your idea, I suggest to access table by index (see the code
below).
I didn't test this code but I think it's a fine way to solve your problem.
# replacement
for(my $i = 0; $i < $#short; $i++){
$offset = int(rand(length($long)%193));
printf "%3d", $offset+1;
substr($long[$i],$offset,length($short[$i]),$short[$i]);
print "\n", $long, "\n";
}
On Mon, 18 Jul 2005 14:06:10 -0700 (PDT), Alex Zhang wrote
> Dear All,
>
> Sorry to bother you again.
>
> I have two txt files to handle. One is
> "short_sequences" and the other
> one is "long_sequences". The "short_sequences" holds
> 100 short sequences (8 nucleotide long) and 100 long
> sequences (200 nucleotide long) in the
> "long_sequence".
>
> For example, the first short sequence is "TTGACATA"
> and the first long sequence is
> "GAATCATATATTAGTCTCCACATACTCCGTTCGTGACCCATTACCCTTTCGGGAGA
> GCCACAGCAACTGTAGATCTCGAAGTTGACAGGGGCAACTAGAGGCCTCAGAATTCT
> CACTCTTGAGGAGAGAAGTCTAAGACCTACAGTATGGTCGGGTTAGTTTTTGTTCCGTC
> GAACCTTGGACTAACCACTGTCTGGATA".
>
> Basically, I want to generate a random position as a
> starting site to replace a substring
> in the long sequence with a short sequence. In this
> example, we can choose a starting site
> as 5th nucleotide in the long sequence, after
> replacing using "TTGACATA", the replaced
> long sequence is
> "GAATTTGACATAAGTCTCCACATACTCCGTTCGTGACCCATTACCCTTTCGGGAGA
> GCCACAGCAACTGTAGATCTCGAAGTTGACAGGGGCAACTAGAGGCCTCAGAATTCT
> CACTCTTGAGGAGAGAAGTCTAAGACCTACAGTATGGTCGGGTTAGTTTTTGTTCCGTC
> GAACCTTGGACTAACCACTGTCTGGATA".
>
> Then I want replace the 2nd long sequence with the 2nd
> short sequence and then repeat this over and over
> again until the last long sequence is reached and
> replaced. I think the only problem is that the
> starting site should not be larger than 193.
> Otherwise, there are
> not enough nucleotides in the long sequence for
> replacement.
>
> Furthurmore, I want to keep track the starting
> replacement site for each long sequence.
>
> I am copying my code in the below.
> ******************************************
> use strict;
> use warnings;
>
> my (@short, @long, $offset); # the 'short' array will
> hold the short
> #sequences while 'long'
> array the long sequences
>
> open(FILE1, '<', "short_sequences.txt") || die "Can't
> open short_sequences.txt: $!\n";
> while(<FILE1>){
> chomp;
> push(@short, $_);
> }
> close FILE1; #Close the file
>
> open(FILE2, '<', "long_sequences.txt") || die "Can't
> open long_sequences.txt: $!\n";
> while(<FILE2>){
> chomp;
> push(@long, $_);
> }
> close FILE2; #Close the file
>
> # replacement
> foreach my $short(@short){
> foreach my $long(@long){
> $offset = int(rand(length($long)%193));
> substr($long,$offset,length($short),$short);
> printf "%3d", $offset+1;
> print "\n", $long, "\n";
>
> }
> }
> ********************************************
>
> But I just realized that there is a problem for the
> two
> loops. The problem is that each short sequence will be
> used to replace all long sequences not the
> corresponding one.
>
> So I seek your suggestions on how to handle two files
> simultaneously for my case.
>
> Thank you very much and look forward to your reply!
>
> Best Regards,
> Alex
>
> __________________________________________________
> Do You Yahoo!?
> Tired of spam? Yahoo! Mail has the best spam protection around
> http://mail.yahoo.com
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
--
Open WebMail Project (http://openwebmail.org)
More information about the Bioperl-l
mailing list