[Bioperl-l] concatenate two embl sequence files
Roy Chaudhuri
roy at colibase.bham.ac.uk
Thu Jan 26 13:18:03 UTC 2006
Heikki Lehvaslaiho wrote:
> Thanks Roy!
>
> I'll check to code in tomorrow when I am less sleepy and can go through the
> code in detail. In principle the code looks good. It definitely needs tests.
> If you have written any please do post them.
Not too sure about how to go about writing tests, any suggestions?
It did occur to me that my _coordAdjust method could be adapted to allow
the Bio::Seq trunc method to retain sequence features (since there's no
reason why the $add argument can't be negative). This would probably
need a bit more work to cope with the situation where a feature overlaps
the trunc coordinates, for example if we truncate to coordinates 1..400,
but there's a feature 300..500. I guess the 'correct' behaviour might be
to convert that feature to a fuzzy location of 300..>400? Or is it
acceptable to have features with coordinates outside of a sequence?
If we did that then an obvious test would be to cat a sequence to
itself, then trunc to retain just the second half of the new sequence
and see if you got back what you started with.
> A few more checks to make sure seq_>alphabet is the same in all sequences
> might be a good idea.
That's easy to implement. Just put the line:
$self->throw('Trying to concatenate sequences with different alphabets:
'.$seq->display_id.' ('.$seq->alphabet.') and ' .$_->display_id.'
('.$_->alphabet.')') unless $_->alphabet eq $seq->alphabet;
at the start of the for(@seqs) loop of the cat subroutine.
Roy.
--
Dr. Roy Chaudhuri
Bioinformatics Research Fellow
Division of Immunity and Infection
University of Birmingham, U.K.
http://xbase.bham.ac.uk
More information about the Bioperl-l
mailing list