[Biojava-l] Getting a Slice of an Alignment

Richard Holland richard.holland at ebi.ac.uk
Tue Jun 27 15:26:37 UTC 2006


Ah...

I just read the source code for the symbolListForLabel() method on sub
alignments, and found what may well be a bug.

BioJava list people, your help please! In my understanding,
symbolListForLabel() should return the symbols from the given label that
fall within the alignment. This is the case in all except sub
alignments. Sub alignments, for whatever reason, are returning the
symbols from the given label that fall within the parent alignment upon
which the sub alignment is based, NOT just those that fall within the
sub alignment itself.

Is this a bug? I think it is.

The solution would be for me to alter
AbstractULAlignment.SubULAlignment.symbolListForLabel() to restrict the
returned symbols to only include those in the area covered by the sub
alignment. It would return EMPTY_SEQUENCE if the label didn't cover the
area of the sub alignment, and it would return a truncated symbol list
if it only partially covered it.

Would this be acceptable?

If so, once this change was made, it would fix Ed's problems below as
subAlignment() would start returning vertical slices as I think it
should probably have done so from the start, rather than the horizontal
slices it is returning at present.

cheers,
Richard


On Tue, 2006-06-27 at 07:20 -0700, Dexter Riley wrote:
> Thanks for looking at the method!  I'll give your improved version a try.
> 
> subAlignment does return a slice of the Alignment; a horizontal slice.  I
> need a vertical slice at a given location.  In other words,
> subAlignment: 
> if sequence in alignment has symbols at location, return entire sequence
> get(Vertical)Slice:
> for sequence in alignment, return subsequence at location
> 
> I use slices for primer design, where I have a candidate primer location and
> want to see the list of different target sequences in the alignment at that
> position (so I can consider possible mismatches, Tm, etc.)  
> It would also be handy for the GUI, to say, "give me a view of bases
> 2000-2567 for every sequence in this really long alignment".
> 
> Thanks,
> Ed
-- 
Richard Holland (BioMart Team)
EMBL-EBI
Wellcome Trust Genome Campus
Hinxton
Cambridge CB10 1SD
UNITED KINGDOM
Tel: +44-(0)1223-494416




More information about the Biojava-l mailing list