[Bioperl-l] concatenate two embl sequence files

Roy Chaudhuri roy at colibase.bham.ac.uk
Thu Jan 26 13:18:03 UTC 2006


Heikki Lehvaslaiho wrote:
> Thanks Roy!
> 
> I'll check to code in tomorrow when I am less sleepy and can go through the 
> code in detail. In principle the code looks good. It definitely needs tests. 
> If you have written any please do post them.
Not too sure about how to go about writing tests, any suggestions?

It did occur to me that my _coordAdjust method could be adapted to allow 
the Bio::Seq trunc method to retain sequence features (since there's no 
reason why the $add argument can't be negative). This would probably 
need a bit more work to cope with the situation where a feature overlaps 
the trunc coordinates, for example if we truncate to coordinates 1..400, 
but there's a feature 300..500. I guess the 'correct' behaviour might be 
to convert that feature to a fuzzy location of 300..>400? Or is it 
acceptable to have features with coordinates outside of a sequence?

If we did that then an obvious test would be to cat a sequence to 
itself, then trunc to retain just the second half of the new sequence 
and see if you got back what you started with.

> A few more checks to make sure seq_>alphabet is the same in all sequences 
> might be a good idea.
That's easy to implement. Just put the line:
	$self->throw('Trying to concatenate sequences with different alphabets: 
'.$seq->display_id.' ('.$seq->alphabet.') and ' .$_->display_id.' 
('.$_->alphabet.')') unless $_->alphabet eq $seq->alphabet;

at the start of the for(@seqs) loop of the cat subroutine.

Roy.
--
Dr. Roy Chaudhuri
Bioinformatics Research Fellow
Division of Immunity and Infection
University of Birmingham, U.K.

http://xbase.bham.ac.uk



More information about the Bioperl-l mailing list