[Bioperl-l] Should coords be adjusted after removing alignment columns?

Sendu Bala bix at sendu.me.uk
Tue Aug 14 17:13:30 UTC 2007


Chris Fields wrote:
> Could you attach the scripts and patches to a bug report for tracking
> so anyone interested can double-check?  Having them in an email is 
> problematic as the text in some clients wraps.

http://bugzilla.open-bio.org/show_bug.cgi?id=2344


> From what I'm seeing I think we're in general agreement, though I'll
>  reason through it to see if I'm following correctly.  The data in
> the SimpleAlign example you give is this:
> 
> a/5-20            atcgatcgatcgatcg
> b/30-43           -tcgatc-atcgatcg
> c/50-63           atcgatcgatc-atc-
>                    ****** *** ***
> 
> Removing the gaps gives:
> 
> a/5-20            tcgatcatcatc
> b/30-43           tcgatcatcatc
> c/50-63           tcgatcatcatc
>                   ************
> 
> The start/end is wrong, as you state.

Yes. For extra clarity, my thinking is that the correct answer is:

a/6-19            tcgatcatcatc
b/30-42           tcgatcatcatc
c/51-63           tcgatcatcatc
                   ************


> Adjusting to map simple start/ends to the original sequence won't
> work as we're removing gaps and residues in the LocatableSeqs along
> with it (ends and internal residues).  I guess if we want to map back
> to the original sequence accurately [snip]

What you say in the rest of your discussion is valid and deserves some 
thought/discussion, but for now just getting the start and end correct, 
ignoring any issues with internal residues, seems like a no-brainer.

For my own purposes that is all I need; having removed gaps I only need 
the start and end so I can take that region from each sequence and do a 
new alignment (for example).



BTW. Either my patch isn't quite perfect or there's another related bug 
I'm still tracking down. I'll commit when I've solved that, unless 
someone points out any mistakes in my thinking.



More information about the Bioperl-l mailing list