[Biojava-dev] Problem with SymbolListCharSequence and Regex
Matthew Pocock
matthew_pocock at yahoo.co.uk
Thu Oct 30 06:02:15 EST 2003
Hi,
This is a coordinate systems problem. Strings index from 0 to length-1,
and ranges are inclusive of the min index and exclusive of the max
index. Sequencees index from 1 to length and ranges are inclusive of min
index and inclusive of max index.
There was a bug in SymbolListCharSequence where code wasn't taking this
into account. Now fixed in CVS.
Matthew
public CharSequence subSequence(int start, int end)
{
return new SymbolListCharSequence(syms.subList(start + 1, end),
// was end + 1
alphaTokens);
}
Ido M. Tamir wrote:
>Hi,
>could this be a bug ?
>
>The regex captured group returned from a
>SymbolListCharSequence is 1 char more extended
>to the right than expected.
>
>Thank you very much for
>your time and effort.
>
>Ido M. Tamir
>
>
>Output for the testcase below:
>
>string: C
>symbol: ca gcat
>
>---testcase:
>
>
>package mf;
>
>import java.util.regex.Matcher;
>import java.util.regex.Pattern;
>
>import org.biojava.bio.seq.DNATools;
>import org.biojava.bio.seq.io.SymbolListCharSequence;
>import org.biojava.bio.symbol.SymbolList;
>
>
>public class TestRegex {
> public static void main(String[] args) {
> try {
> Pattern p = Pattern.compile("C", Pattern.CASE_INSENSITIVE);
> String strSeq = "GCAT";
> SymbolList symSeq = DNATools.createDNA(strSeq);
> Matcher m = p.matcher( strSeq );
> if( m.find() ){
> System.out.println( "string: " + m.group() );
> }
> m = p.matcher( new SymbolListCharSequence(symSeq ));
> if( m.find() ){
> System.out.println( "symbol: " + m.group() + " " + new
>SymbolListCharSequence(symSeq ));
> }
> } catch (Exception e) {
> e.printStackTrace();
> }
> }
>}
>
>
>_______________________________________________
>biojava-dev mailing list
>biojava-dev at biojava.org
>http://biojava.org/mailman/listinfo/biojava-dev
>
>
>
More information about the biojava-dev
mailing list