[Biojava-l] Editing a RichSequence[Scanned]

Richard Holland holland at ebi.ac.uk
Mon Feb 18 16:12:52 UTC 2008


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

OK, got it.

It's because ChunkedSymbolListFactory is creating a ChunkedSymbolList
for your sequence, because the sequence is greater than 1<<14 bp long
(that's about 16384 bytes). This is a hardcoded limit.

ChunkedSymbolList extends AbstractSymbolList, which is immutable and
therefore not editable.

I'm not sure who wrote ChunkedSymbolList - and I'm not sure how to (or
if I should) fix it. It's quite a deeply embedded piece of the system.

Does anyone out there know?

There is a workaround - create a new symbol list based on the
RichSequence ( SymbolList syms = new SimpleSymbolList(richSeq) ). The
copy will be mutable and edit() will work on it.

cheers,
Richard

Jolyon Holdstock wrote:
> Hi,
> 
> I tried using the readGenbank method with the following code...
> 
> [code]
> import java.io.BufferedReader;
> import java.io.File;
> import java.io.FileNotFoundException;
> import java.io.FileReader;
> import java.io.IOException;
> 
> import org.biojava.bio.BioException;
> import org.biojava.bio.symbol.Edit;
> import org.biojava.bio.symbol.SymbolList;
> import org.biojava.bio.seq.DNATools;
> import org.biojava.bio.seq.io.SymbolTokenization;
> import org.biojava.utils.ChangeVetoException;
> 
> import org.biojavax.RichObjectFactory;
> import org.biojavax.bio.seq.RichSequence;
> import org.biojavax.bio.seq.io.RichSequenceBuilderFactory;
> 
> public class EditBigSequence {
>   RichSequence richSeq;
>   Edit edit;
> 
>   public EditBigSequence() {
>     try {
>       SymbolTokenization symbolTokenization =
> DNATools.getDNA().getTokenization("token");
>       richSeq = RichSequence.IOTools.readGenbank(new BufferedReader(new
> FileReader(new File("AF234172.gbk"))), 
>                                                  symbolTokenization,
>  
> RichSequenceBuilderFactory.FACTORY,
>  
> RichObjectFactory.getDefaultNamespace()).nextRichSequence();
>      
>       SymbolList insertSeq = DNATools.createDNA("AAAACCCCGGGGTTTT");
>       edit = new Edit(1000, 100, insertSeq);
>       richSeq.edit(edit);
>     }
>     catch (FileNotFoundException FNFE){  
>       System.out.println("FileNotFoundException: " + FNFE);
>     }
>     catch (BioException BIOE){
>       System.out.println("BioException: " + BIOE);
>     }
>     catch (ChangeVetoException CVE){
>       CVE.printStackTrace();
>       System.out.println("ChangeVetoException: " + CVE);
>     }
>     catch (IOException IOE){
>       System.out.println("IOException: " + IOE);
>     }
>   }
>   
>   public static void main(String args []){
>     EditBigSequence ebs = new EditBigSequence();
>   }
> }
> [/code]
> 
> But I still got an error, for which the StckTrace is below.
> 
> org.biojava.utils.ChangeVetoException: AbstractSymbolList is immutable
> ChangeVetoException: org.biojava.utils.ChangeVetoException:
> AbstractSymbolList is immutable
>         at
> org.biojava.bio.symbol.AbstractSymbolList.edit(AbstractSymbolList.java:1
> 13)
>         at
> org.biojavax.bio.seq.DummyRichSequenceHandler.edit(DummyRichSequenceHand
> ler.java:30)
>         at
> org.biojavax.bio.seq.ThinRichSequence.edit(ThinRichSequence.java:155)
>         at biojavahacks.EditBigSequence.<init>(EditBigSequence.java:47)
>         at biojavahacks.EditBigSequence.main(EditBigSequence.java:65)
> 
> 
> cheers,
> 
> Jolyon
> 
> 
> -----Original Message-----
> From: Richard Holland [mailto:holland at ebi.ac.uk] 
> Sent: 15 February 2008 15:17
> To: Jolyon Holdstock
> Cc: biojava-l at biojava.org
> Subject: Re: [Biojava-l] Editing a RichSequence[Scanned]
> 
> I think it's because sequences are constructed internally in a
> ChunkedSymbolListFactory which compresses large sequences whereas small
> sequences are stored as normal uncompressed ones. Compressed sequences
> extend AbstractSymbolList, which is immutable (and therefore uneditable)
> whereas uncompressed ones do not, and hence are editable.
> 
> You can disable the use of compressed sequences by using readGenbank()
> instead of readGenbankDNA() and passing in the DNA alphabet and the
> non-compressed sequence factory (see the static constants in
> RichSequenceBuilderFactory).
> 
> If this still doesn't work, please could you post the full stacktrace so
> that we can see which class is throwing the exception and at what line
> etc.
> 
> cheers,
> Richard
> 
> On Fri, February 15, 2008 2:44 pm, Jolyon Holdstock wrote:
>> Hi
>>
>>
>> Hi,
>>
>> I am trying to edit a Genbank sequence.
>> The code I'm using is as follows:
>>
>> [code]
>> richSeq = RichSequence.IOTools.readGenbankDNA(new BufferedReader(new
>> FileReader(new File("U00096.gbk"))), null).nextRichSequence();
>>
>> SymbolList sl1 = DNATools.createDNA("AAAGGGTTTCCC");
>> Edit editOne = new Edit(47078, 2690, sl1);
>> richSeq.edit(editOne);
>>
>> [/code]
>>
>> When it runs it gives the following error
>>
>> ChangeVetoException: org.biojava.utils.ChangeVetoException:
>> AbstractSymbolList is immutable
>>
>>
>> I have used the code for a smaller sequence (15kb, compared with 4Mb)
>> and it works.
>>
>> Does anyone have an idea why this is not working?
>>
>> Thanks,
>>
>> Jolyon
>>
>>
>>
>>
>>
>> Jolyon Holdstock Ph.D.
>> Senior Computational Biologist,
>> Oxford Gene Technology,
>> Begbroke Science Park,
>> Sandy Lane, Yarnton
>> Oxford, OX5 1PF
>>
>> Tel: +44 (0)1865 856852
>> Fax: +44 (0)1865 842116
>>
>> Oxford Gene Technology (Operations) Ltd. Registered in England
>> No:03845432 Begbroke Science Park, Sandy Lane, Yarnton, Oxford, OX5
> 1PF.
>> Confidentiality Notice: The contents of this email from the Oxford
> Gene
>> Technology Group of Companies are confidential and intended solely for
>> the person to whom it is addressed. It may contain privileged and
>> confidential information. If you are not the intended recipient you
> must
>> not read, copy, distribute, discuss or take any action in reliance on
>> it.
>>
>>
>> _______________________________________________
>> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/biojava-l
>>
> 
> 

- --
Richard Holland (BioMart)
EMBL EBI, Wellcome Trust Genome Campus,
Hinxton, Cambridgeshire CB10 1SD, UK
Tel. +44 (0)1223 494416

http://www.biomart.org/
http://www.biojava.org/
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2.2 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFHua6D4C5LeMEKA/QRAn/WAJ9sTII9aMU60LWdQvlgy1Ntp60q0QCdFeYa
w60vXjENWcQLCiBf1ezRgh8=
=M4J7
-----END PGP SIGNATURE-----



More information about the Biojava-l mailing list