[Bioperl-l] Bio::LocatableSeq end checking inconsistency
Jun Yin
jun.yin at ucd.ie
Fri Aug 13 13:36:34 UTC 2010
Hi, all,
I am the google summer of code student working on Bio::Align subsystem
refactoring. The code (Bio::SimpleAlign) I re-implemented now has passed
nearly all the test, except a few tests on seq/start-end testing. But here
comes a problem. This may be an old issue, that the Bio::LocatableSeq end
assignment and checking are inconsistent.
The current end checking method is based on:
$end=$seq->_ungapped_len+$seq->start-1
However, this checking may not fit the real world case.
The inconsistency usually happens when a few columns of the sequence are
removed.
For example:
my $a = Bio::LocatableSeq->new(
-id => 'a',
-strand => 1,
-seq => '-tcgatc-atcgatcg',
-start => 30,
-end => 43
);
If we remove the 1st, 8th and the last columns
$a->seq() will be 'tcgatcatcgatc'
$a->_ungapped_len==12
Actually, in the real world, the first residue will still be 30 (the old
$seq->start), and the last residue is the residue before the 43 (the old
$seq->end), thus 42.
But if you call a validation, the calculation is
$a->_ungapped_len+$a->start-1=12+30-1=41
So the reassignment of the $seq->end will not pass the validation.
So unless you save the information to a new sequence object, the original
position information will be lost anyway. But in some cases, we have to
change the sequence in its original sequence object ..
What is your suggestion on this issue?
A. pass the test and lose the information #convenient in coding but the
start-end annotation is not right any more
B. keep the information and forget the test #the object will still
remember where the last residue was in the original sequence. But is it
really meaningful at all? Because all the other residues may come from
nowhere
C. Neither of above #any other suggestions?
Cheers,
Jun Yin
Ph.D. student in U.C.D.
Bioinformatics Laboratory
Conway Institute
University College Dublin
More information about the Bioperl-l
mailing list