[Biojava-l] Editing a RichSequence[Scanned]

Richard Holland holland at ebi.ac.uk
Mon Feb 18 16:20:14 UTC 2008


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

PS. The other workaround is to modify your local copy of BioJava, find
the ChunkedSymbolList class, and change the 1<<14 CHUNK_SIZE limit to
some higher value.

Richard Holland wrote:
> OK, got it.
> 
> It's because ChunkedSymbolListFactory is creating a ChunkedSymbolList
> for your sequence, because the sequence is greater than 1<<14 bp long
> (that's about 16384 bytes). This is a hardcoded limit.
> 
> ChunkedSymbolList extends AbstractSymbolList, which is immutable and
> therefore not editable.
> 
> I'm not sure who wrote ChunkedSymbolList - and I'm not sure how to (or
> if I should) fix it. It's quite a deeply embedded piece of the system.
> 
> Does anyone out there know?
> 
> There is a workaround - create a new symbol list based on the
> RichSequence ( SymbolList syms = new SimpleSymbolList(richSeq) ). The
> copy will be mutable and edit() will work on it.
> 
> cheers,
> Richard
> 
> Jolyon Holdstock wrote:
>>> Hi,
>>>
>>> I tried using the readGenbank method with the following code...
>>>
>>> [code]
>>> import java.io.BufferedReader;
>>> import java.io.File;
>>> import java.io.FileNotFoundException;
>>> import java.io.FileReader;
>>> import java.io.IOException;
>>>
>>> import org.biojava.bio.BioException;
>>> import org.biojava.bio.symbol.Edit;
>>> import org.biojava.bio.symbol.SymbolList;
>>> import org.biojava.bio.seq.DNATools;
>>> import org.biojava.bio.seq.io.SymbolTokenization;
>>> import org.biojava.utils.ChangeVetoException;
>>>
>>> import org.biojavax.RichObjectFactory;
>>> import org.biojavax.bio.seq.RichSequence;
>>> import org.biojavax.bio.seq.io.RichSequenceBuilderFactory;
>>>
>>> public class EditBigSequence {
>>>   RichSequence richSeq;
>>>   Edit edit;
>>>
>>>   public EditBigSequence() {
>>>     try {
>>>       SymbolTokenization symbolTokenization =
>>> DNATools.getDNA().getTokenization("token");
>>>       richSeq = RichSequence.IOTools.readGenbank(new BufferedReader(new
>>> FileReader(new File("AF234172.gbk"))), 
>>>                                                  symbolTokenization,
>>>  
>>> RichSequenceBuilderFactory.FACTORY,
>>>  
>>> RichObjectFactory.getDefaultNamespace()).nextRichSequence();
>>>      
>>>       SymbolList insertSeq = DNATools.createDNA("AAAACCCCGGGGTTTT");
>>>       edit = new Edit(1000, 100, insertSeq);
>>>       richSeq.edit(edit);
>>>     }
>>>     catch (FileNotFoundException FNFE){  
>>>       System.out.println("FileNotFoundException: " + FNFE);
>>>     }
>>>     catch (BioException BIOE){
>>>       System.out.println("BioException: " + BIOE);
>>>     }
>>>     catch (ChangeVetoException CVE){
>>>       CVE.printStackTrace();
>>>       System.out.println("ChangeVetoException: " + CVE);
>>>     }
>>>     catch (IOException IOE){
>>>       System.out.println("IOException: " + IOE);
>>>     }
>>>   }
>>>   
>>>   public static void main(String args []){
>>>     EditBigSequence ebs = new EditBigSequence();
>>>   }
>>> }
>>> [/code]
>>>
>>> But I still got an error, for which the StckTrace is below.
>>>
>>> org.biojava.utils.ChangeVetoException: AbstractSymbolList is immutable
>>> ChangeVetoException: org.biojava.utils.ChangeVetoException:
>>> AbstractSymbolList is immutable
>>>         at
>>> org.biojava.bio.symbol.AbstractSymbolList.edit(AbstractSymbolList.java:1
>>> 13)
>>>         at
>>> org.biojavax.bio.seq.DummyRichSequenceHandler.edit(DummyRichSequenceHand
>>> ler.java:30)
>>>         at
>>> org.biojavax.bio.seq.ThinRichSequence.edit(ThinRichSequence.java:155)
>>>         at biojavahacks.EditBigSequence.<init>(EditBigSequence.java:47)
>>>         at biojavahacks.EditBigSequence.main(EditBigSequence.java:65)
>>>
>>>
>>> cheers,
>>>
>>> Jolyon
>>>
>>>
>>> -----Original Message-----
>>> From: Richard Holland [mailto:holland at ebi.ac.uk] 
>>> Sent: 15 February 2008 15:17
>>> To: Jolyon Holdstock
>>> Cc: biojava-l at biojava.org
>>> Subject: Re: [Biojava-l] Editing a RichSequence[Scanned]
>>>
>>> I think it's because sequences are constructed internally in a
>>> ChunkedSymbolListFactory which compresses large sequences whereas small
>>> sequences are stored as normal uncompressed ones. Compressed sequences
>>> extend AbstractSymbolList, which is immutable (and therefore uneditable)
>>> whereas uncompressed ones do not, and hence are editable.
>>>
>>> You can disable the use of compressed sequences by using readGenbank()
>>> instead of readGenbankDNA() and passing in the DNA alphabet and the
>>> non-compressed sequence factory (see the static constants in
>>> RichSequenceBuilderFactory).
>>>
>>> If this still doesn't work, please could you post the full stacktrace so
>>> that we can see which class is throwing the exception and at what line
>>> etc.
>>>
>>> cheers,
>>> Richard
>>>
>>> On Fri, February 15, 2008 2:44 pm, Jolyon Holdstock wrote:
>>>> Hi
>>>>
>>>>
>>>> Hi,
>>>>
>>>> I am trying to edit a Genbank sequence.
>>>> The code I'm using is as follows:
>>>>
>>>> [code]
>>>> richSeq = RichSequence.IOTools.readGenbankDNA(new BufferedReader(new
>>>> FileReader(new File("U00096.gbk"))), null).nextRichSequence();
>>>>
>>>> SymbolList sl1 = DNATools.createDNA("AAAGGGTTTCCC");
>>>> Edit editOne = new Edit(47078, 2690, sl1);
>>>> richSeq.edit(editOne);
>>>>
>>>> [/code]
>>>>
>>>> When it runs it gives the following error
>>>>
>>>> ChangeVetoException: org.biojava.utils.ChangeVetoException:
>>>> AbstractSymbolList is immutable
>>>>
>>>>
>>>> I have used the code for a smaller sequence (15kb, compared with 4Mb)
>>>> and it works.
>>>>
>>>> Does anyone have an idea why this is not working?
>>>>
>>>> Thanks,
>>>>
>>>> Jolyon
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> Jolyon Holdstock Ph.D.
>>>> Senior Computational Biologist,
>>>> Oxford Gene Technology,
>>>> Begbroke Science Park,
>>>> Sandy Lane, Yarnton
>>>> Oxford, OX5 1PF
>>>>
>>>> Tel: +44 (0)1865 856852
>>>> Fax: +44 (0)1865 842116
>>>>
>>>> Oxford Gene Technology (Operations) Ltd. Registered in England
>>>> No:03845432 Begbroke Science Park, Sandy Lane, Yarnton, Oxford, OX5
>>> 1PF.
>>>> Confidentiality Notice: The contents of this email from the Oxford
>>> Gene
>>>> Technology Group of Companies are confidential and intended solely for
>>>> the person to whom it is addressed. It may contain privileged and
>>>> confidential information. If you are not the intended recipient you
>>> must
>>>> not read, copy, distribute, discuss or take any action in reliance on
>>>> it.
>>>>
>>>>
>>>> _______________________________________________
>>>> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/biojava-l
>>>>
>>>
> 
> --
> Richard Holland (BioMart)
> EMBL EBI, Wellcome Trust Genome Campus,
> Hinxton, Cambridgeshire CB10 1SD, UK
> Tel. +44 (0)1223 494416
> 
> http://www.biomart.org/
> http://www.biojava.org/
_______________________________________________
Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/biojava-l

- --
Richard Holland (BioMart)
EMBL EBI, Wellcome Trust Genome Campus,
Hinxton, Cambridgeshire CB10 1SD, UK
Tel. +44 (0)1223 494416

http://www.biomart.org/
http://www.biojava.org/
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2.2 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFHubA+4C5LeMEKA/QRAnaXAJ9qec6JaBIAroziiOYOM+NUIsQGHQCghT9P
zOsc+G843TiPRPGw8YaSG3Q=
=O/UX
-----END PGP SIGNATURE-----



More information about the Biojava-l mailing list