[Biojava-dev] Serialization problems, "-" turns to "n" after serializing sequence

mark.schreiber at novartis.com mark.schreiber at novartis.com
Wed Oct 19 21:05:56 EDT 2005


Hello -

Found out what was happening. Not a problem with serialization but a 
problem with the createDNASequence method. This method wasn't dealing well 
with gaps. There is another DNATools.createGappedDNASequence() that is 
supposed to do what you want. Ideally you shouldn't use the 
createDNASequence method with gap symbols. I have changed it now so that 
if it detects one it calls the createGapped method. This is in CVS. Your 
test seems to work now.

More generally I may need to apply this to RNATools and ProteinTools as 
well. I'll hve a look.

- Mark





Mark Schreiber/GP/Novartis at PH
Sent by: biojava-dev-bounces at portal.open-bio.org
10/19/2005 11:19 AM

 
        To:     Kalle Näslund <kalle.naslund at genpat.uu.se>
        cc:     biojava-dev at biojava.org, (bcc: Mark Schreiber/GP/Novartis)
        Subject:        Re: [Biojava-dev] Serialization problems,       "-" turns to "n" after 
serializing sequence


Hello -

What should happen is that a method called readResolve() should be called 
by the JVM on deserialization to replace the gap symbol that was 
deserialized with the gap symbol of the local AlphabetManager.

This prevents you from having a gap that is not == the gap provided by the 

alphabet manager. It seems that somehow it is instead being replaced by 
the ambiguity symbol n.

It may take me a while to get around to looking at this. If you find it, 
please let me know. If I forget, please remind me : )

- Mark





Kalle Näslund <kalle.naslund at genpat.uu.se>
Sent by: biojava-dev-bounces at portal.open-bio.org
10/19/2005 02:04 AM

 
        To:     biojava-dev at biojava.org
        cc:     (bcc: Mark Schreiber/GP/Novartis)
        Subject:        [Biojava-dev] Serialization problems,   "-" turns 
to "n" after serializing 
sequence


Hi!

I seem to be stuck with a serialization issue, somewhere deep in the 
alphabet stuff. The problem is that "-" turns into "n". This happens 
both with farily new CVS code as well as 1.4 release code.

The code i am using is the following:

import java.util.*;
import java.io.*;

import org.biojava.bio.seq.*;
import org.biojava.bio.symbol.*;
import org.biojava.utils.*;
import org.biojava.bio.*;

/**
 * Temp class, just to check out some serialization issues im having.
 *
 * @author kalle
 */
public class AlignmentSerializationTest {

    public void run() throws Exception {
        Sequence dnaSeq1 = 
DNATools.createDNASequence("---ATGC---ATGC---", "seq1" );

        dumpInfoAboutSequence( dnaSeq1 );

        System.out.println("Writing alignment to disk");

        File file = new File("/tmp/ali.obj");
        FileOutputStream fOS = new FileOutputStream( file );
        ObjectOutputStream oOS = new ObjectOutputStream( fOS );

        oOS.writeObject( dnaSeq1 );

        oOS.close();
        fOS.close();

        System.out.println( "Loading alignment from disk" );
        FileInputStream     fIS = new FileInputStream( file );
        ObjectInputStream   oIS = new ObjectInputStream( fIS );

        Sequence  serSeq  = ( Sequence )oIS.readObject();

        dumpInfoAboutSequence( serSeq );
    }

    public static void main( String[] flags ) throws Exception {
        AlignmentSerializationTest myAST = new 
AlignmentSerializationTest();
        myAST.run();
    }

    private void dumpInfoAboutSequence( Sequence sequence ) throws 
Exception {
        System.out.println("Name      :" + sequence.getName() );
        System.out.println("Alphabet  :" + sequence.getAlphabet() );
        System.out.println("GapSymbol :" + 
sequence.getAlphabet().getGapSymbol() );
        System.out.println("Sequence  :" + sequence.seqString() );
        System.out.println("Tokeniz   :" + 
sequence.getAlphabet().getTokenization( "token" ) );
    }
}


And the output i get is :

Name      :seq1
Alphabet 
:org.biojava.bio.symbol.AlphabetManager$ImmutableWellKnownAlphabetWrapper at 1bc887b
GapSymbol :org.biojava.bio.symbol.SimpleBasisSymbol: []
Sequence  :---atgc---atgc---
Tokeniz 
:org.biojava.bio.symbol.AlphabetManager$WellKnownTokenizationWrapper at 120cc56

Writing alignment to disk

Loading alignment from disk

Name      :seq1
Alphabet 
:org.biojava.bio.symbol.AlphabetManager$ImmutableWellKnownAlphabetWrapper at 1bc887b
GapSymbol :org.biojava.bio.symbol.SimpleBasisSymbol: []
Sequence  :nnnatgcnnnatgcnnn
Tokeniz 
:org.biojava.bio.symbol.AlphabetManager$WellKnownTokenizationWrapper at 120cc56


I have spent some time using a debugger and stepping trough the bj code 
but realised that it will most likely take me loads of time, and was 
hoping that some of you guys that have some more experience with the 
alphabet stuff could atleast point me in the right direction, if not 
outright recognize the bug =)

kind regards Kalle
_______________________________________________
biojava-dev mailing list
biojava-dev at biojava.org
http://biojava.org/mailman/listinfo/biojava-dev




_______________________________________________
biojava-dev mailing list
biojava-dev at biojava.org
http://biojava.org/mailman/listinfo/biojava-dev






More information about the biojava-dev mailing list