[Biojava-l] Parsing exising gaps

Ditlev Egeskov Brodersen deb at mb.au.dk
Thu Nov 15 12:04:02 UTC 2007


Dear all,

 

I have managed to read an MSF-formatted alignment from a file selected
through FileChooser as follows:

 

  BufferedReader br = new BufferedReader(new
FileReader(aFileChooser.getSelectedFile()));

  SimpleAlignment align =
(SimpleAlignment)SeqIOTools.fileToBiojava(AlignIOConstants.MSF_AA, br);

 

I can now retrieve the sequence names and sequences through the Alignment
object:

 

  Iterator aLabels = align.getLabels().iterator();

  Iterator aSequences = align.symbolListIterator();

 

However, I now what to be able to translate between real sequence numbers
and the positions within each alignment string, i.e. retrieve positions that
remove the gaps first (gaps are represented by hyphens '-' in the MSF
format). How can I tell BioJava to parse the gaps into an GappedSequence
format? I have tried the following to check what position 15 (past the the
first gap) translates into:

 

  int n = 0;

  while(aSequences.hasNext()) {

      SimpleSymbolList aSym = (SimpleSymbolList)aSequences.next();

      SimpleGappedSequence aGapped = new SimpleGappedSequence(new
SimpleSequence(aSym, "", aLabels.next().toString(), null));

      System.out.println(aGapped.gappedToLocation(new PointLocation(15)));

  }

 

But I only get 15 back out. I have also studied the constructor of the
underlying SimpleGappedSymbolList but it simply copies the SymbolList and
creates one big block:

 

  public SimpleGappedSymbolList(SymbolList source) {

    this.source = source;

    this.alpha = source.getAlphabet();

    this.blocks = new ArrayList();

    this.length = source.length();

    Block b = new Block(1, length, 1, length);

    blocks.add(b);

  }

 

Is there a way to tell SimpleGappedSequence to parse itself in terms of the
gap characters in the sequence string? How is the sequence represented in
this case, if not by gaps? Surely the hyphen cannot be a part of the
standard PROTEIN-TERM alphabet, yet I get no complaints for the use of it?

 

Best wishes,

 

  Ditlev

 

--

 

Ditlev E. Brodersen, Ph.D.
Lektor, Associate Professor

 

Department of Molecular Biology   Office:  +45 89425259
University of AarhusLab:     +45 89425022
Gustav Wieds Vej 10cFax:     +45 86123178
DK-8000 Aarhus C    Email:    <mailto:deb at mb.au.dk> deb at mb.au.dk
Denmark             Lab WWW:  <http://bioxray.dk/~deb> www.bioxray.dk/~deb

 




More information about the Biojava-l mailing list