[Biojava-l] SimpleSequenceBuilder behaviour

Thomas Down td2@sanger.ac.uk
Tue, 27 Mar 2001 12:00:48 +0100


On Tue, Mar 27, 2001 at 11:11:39AM +0100, Keith James wrote:
> 
> Someone has just pointed out to me that the behaviour of the
> SimpleSequenceBuilder with respect to FeatureProperties has changed
> when adding new properties. In addProperty():
> 
> 
> if (oldValue != null) {
>     if (oldValue instanceof String) {
> 	newValue = ((String) oldValue) + " " + newValue.toString();
>     } else {
> 	if (oldValue instanceof Collection) {
> 	    ((Collection) oldValue).add(newValue);
> 	    newValue = oldValue;
> 	} else {
> 	    List nvList = new ArrayList();
> 	    nvList.add(oldValue);
> 	    nvList.add(newValue);
> 	    newValue = nvList;
> 	}
>     }
> }
> 
> 
> So Strings are a special case and are being (unexpectedly, for us)
> concatenated. This means that EMBL/Genbank features with multiple,
> say, /gene qualifiers are having them mangled.
> 
> If I remove the first 3 lines of this block, things will work
> again. But would that break anything else?


This has always been an open issue for a while.  The string-munging-
together thing seemed like a good idea at the time, but was probably
misguided.

Feel free to check in code to do lists for everything -- it'll
certainly be cleaner.

> The alternative is for the parsers to keep a check of what qualifier
> keys have been sent to the listener and always supply the first value
> String wrapped in a List, then subsequent values as Strings, which is
> a hack likely to break again later.

No, definitely don't do this.

Thomas.