[Bioperl-l] how to work on two txt files simultaneously by handle corresponding lines from each file

khoueiry khoueiry at ibdm.univ-mrs.fr
Mon Jul 18 17:19:47 EDT 2005


If I understood well your idea, I suggest to access table by index (see the code 
below).
I didn't test this code but I think it's a fine way to solve your problem.


# replacement
 for(my $i = 0; $i < $#short; $i++){
     $offset = int(rand(length($long)%193));
     printf "%3d", $offset+1;
     substr($long[$i],$offset,length($short[$i]),$short[$i]);
     print "\n", $long, "\n";
 
    }
 




On Mon, 18 Jul 2005 14:06:10 -0700 (PDT), Alex Zhang wrote
> Dear All,
> 
> Sorry to bother you again.
> 
> I have two txt files to handle. One is
> "short_sequences" and the other
> one is "long_sequences". The "short_sequences" holds
> 100 short sequences (8 nucleotide long) and 100 long
> sequences (200 nucleotide long) in the
> "long_sequence".
> 
> For example, the first short sequence is "TTGACATA"
> and the first long sequence is
> "GAATCATATATTAGTCTCCACATACTCCGTTCGTGACCCATTACCCTTTCGGGAGA
> GCCACAGCAACTGTAGATCTCGAAGTTGACAGGGGCAACTAGAGGCCTCAGAATTCT
> CACTCTTGAGGAGAGAAGTCTAAGACCTACAGTATGGTCGGGTTAGTTTTTGTTCCGTC
> GAACCTTGGACTAACCACTGTCTGGATA".
> 
> Basically, I want to generate a random position as a
> starting site to replace a substring
> in the long sequence with a short sequence. In this
> example, we can choose a starting site
> as 5th nucleotide in the long sequence, after
> replacing using "TTGACATA", the replaced
> long sequence is
> "GAATTTGACATAAGTCTCCACATACTCCGTTCGTGACCCATTACCCTTTCGGGAGA
> GCCACAGCAACTGTAGATCTCGAAGTTGACAGGGGCAACTAGAGGCCTCAGAATTCT
> CACTCTTGAGGAGAGAAGTCTAAGACCTACAGTATGGTCGGGTTAGTTTTTGTTCCGTC
> GAACCTTGGACTAACCACTGTCTGGATA".
> 
> Then I want replace the 2nd long sequence with the 2nd
> short sequence and then repeat this over and over
> again until the last long sequence is reached and
> replaced. I think the only problem is that the
> starting site should not be larger than 193.
> Otherwise, there are
> not enough nucleotides in the long sequence for
> replacement.
> 
> Furthurmore, I want to keep track the starting
> replacement site for each long sequence.
> 
> I am copying my code in the below. 
> ******************************************
> use strict;
> use warnings;
> 
> my (@short, @long, $offset); # the 'short' array will
> hold the short
>                             #sequences while 'long'
> array the long sequences
> 
> open(FILE1, '<', "short_sequences.txt") || die "Can't
> open short_sequences.txt: $!\n";
> while(<FILE1>){
>    chomp;
>    push(@short, $_);
> }
> close FILE1; #Close the file
> 
> open(FILE2, '<', "long_sequences.txt")  || die "Can't
> open long_sequences.txt: $!\n";
> while(<FILE2>){
>    chomp;
>    push(@long, $_);
> }
> close FILE2; #Close the file
> 
> # replacement
> foreach my $short(@short){
>    foreach my $long(@long){
>        $offset = int(rand(length($long)%193));
>        substr($long,$offset,length($short),$short);
>        printf "%3d", $offset+1;
>        print "\n", $long, "\n";
> 
>    }
> }
> ********************************************
> 
> But I just realized that there is a problem for the
> two
> loops. The problem is that each short sequence will be
> used to replace all long sequences not the
> corresponding one.
> 
> So I seek your suggestions on how to handle two files
> simultaneously for my case.
> 
> Thank you very much and look forward to your reply!
> 
> Best Regards,
>     Alex
> 
> __________________________________________________
> Do You Yahoo!?
> Tired of spam?  Yahoo! Mail has the best spam protection around 
> http://mail.yahoo.com 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l


--
Open WebMail Project (http://openwebmail.org)



More information about the Bioperl-l mailing list