[Bioperl-l] how to work on two txt files simultaneously by handle corresponding lines from each file

Alex Zhang mayagao1999 at yahoo.com
Mon Jul 18 17:06:10 EDT 2005


Dear All,

Sorry to bother you again.

I have two txt files to handle. One is
"short_sequences" and the other
one is "long_sequences". The "short_sequences" holds
100 short sequences (8 nucleotide long) and 100 long
sequences (200 nucleotide long) in the
"long_sequence".

For example, the first short sequence is "TTGACATA"
and the first long sequence is
"GAATCATATATTAGTCTCCACATACTCCGTTCGTGACCCATTACCCTTTCGGGAGA
GCCACAGCAACTGTAGATCTCGAAGTTGACAGGGGCAACTAGAGGCCTCAGAATTCT
CACTCTTGAGGAGAGAAGTCTAAGACCTACAGTATGGTCGGGTTAGTTTTTGTTCCGTC
GAACCTTGGACTAACCACTGTCTGGATA".

Basically, I want to generate a random position as a
starting site to replace a substring
in the long sequence with a short sequence. In this
example, we can choose a starting site
as 5th nucleotide in the long sequence, after
replacing using "TTGACATA", the replaced
long sequence is
"GAATTTGACATAAGTCTCCACATACTCCGTTCGTGACCCATTACCCTTTCGGGAGA
GCCACAGCAACTGTAGATCTCGAAGTTGACAGGGGCAACTAGAGGCCTCAGAATTCT
CACTCTTGAGGAGAGAAGTCTAAGACCTACAGTATGGTCGGGTTAGTTTTTGTTCCGTC
GAACCTTGGACTAACCACTGTCTGGATA".

Then I want replace the 2nd long sequence with the 2nd
short sequence and then repeat this over and over
again until the last long sequence is reached and
replaced. I think the only problem is that the
starting site should not be larger than 193.
Otherwise, there are
not enough nucleotides in the long sequence for
replacement.

Furthurmore, I want to keep track the starting
replacement site for each long sequence.


I am copying my code in the below. 
******************************************
use strict;
use warnings;

my (@short, @long, $offset); # the 'short' array will
hold the short
                            #sequences while 'long'
array the long sequences

open(FILE1, '<', "short_sequences.txt") || die "Can't
open short_sequences.txt: $!\n";
while(<FILE1>){
   chomp;
   push(@short, $_);
}
close FILE1; #Close the file

open(FILE2, '<', "long_sequences.txt")  || die "Can't
open long_sequences.txt: $!\n";
while(<FILE2>){
   chomp;
   push(@long, $_);
}
close FILE2; #Close the file


# replacement
foreach my $short(@short){
   foreach my $long(@long){
       $offset = int(rand(length($long)%193));
       substr($long,$offset,length($short),$short);
       printf "%3d", $offset+1;
       print "\n", $long, "\n";

   }
}
********************************************

But I just realized that there is a problem for the
two
loops. The problem is that each short sequence will be
used to replace all long sequences not the
corresponding one. 

So I seek your suggestions on how to handle two files
simultaneously for my case. 

Thank you very much and look forward to your reply!

Best Regards,
    Alex

__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 


More information about the Bioperl-l mailing list