[Bioperl-l] Make edits to a large sequence

wannymahoots dan.halligan at gmail.com
Tue Jun 28 12:46:09 UTC 2011


Hi,

I'm looking for the quickest / most efficient way to make many edits
(mutations) to a long fasta sequence using bioperl.  The sequences are
of the order of 200Mb long, and I would like to make 1,000s of changes
to single bases (e.g. A->T at position 1,000, G->C at position 1,201
etc.).  The only way I've come across to do this is reading in the
sequence and then making edits using SeqUtils, so something like:

my $in = Bio::SeqIO->new('-file' => "file.fa", '-format' => "fasta");

while(my $seq = $in->next_seq()) {
        my $mut = Bio::LiveSeq::Mutation->new(-seq => 'c',-pos => 3);
        Bio::SeqUtils->mutate($seq,$mut);
}

However, I'm concerned that this might be making multiple copies of
the large sequence, and that using substr (which is how mutate works),
is perhaps not the most efficient.  Would it be better to save the
fasta sequence as an array and change individual array positions
directly?

Many thanks for any advice.



More information about the Bioperl-l mailing list