[Biojava-l] Editing a RichSequence[Scanned]

Mark Schreiber markjschreiber at gmail.com
Thu Feb 21 02:33:39 UTC 2008


Here is the solution (from the JavaDoc)

 SimpleRichSequenceBuilderFactory public
SimpleRichSequenceBuilderFactory(SymbolListFactory fact,
 int threshold)
 Creates a new instance of SimpleRichSequenceBuilderFactory that uses
a specified factory for SymbolLists longer than a specified length.
Before that a SimpleSymbolListFacotry is used.

 Parameters:fact - the factory to use when building the
SymbolList.threshold - the threshold to exceed before using this
factory

On Tue, Feb 19, 2008 at 8:12 PM, Jolyon Holdstock
<jolyon.holdstock at ogt.co.uk> wrote:
> Hi,
>
> Thanks for the workaround, better than me using a StringBuffer to do it.
>
> The problem with either is that I want to load a Genbank file, insert
> some sequence, adjust the positions of affected features and then output
> the RichSequence in Genbank format.
>
> If I make a copy of the SymbolList I won't be able output the adjusted
> sequence with the features etc... as a Genbank file.
>
> I can do it in 2 steps via copy and pasting from the files produced. I
> just wondered if it is possible to do it with BioJava using a single
> step.
>
>
> Cheers,
>
> Jolyon
>
>
> -----Original Message-----
> From: Richard Holland [mailto:holland at ebi.ac.uk]
>
>
>
> Sent: 18 February 2008 16:20
> To: Jolyon Holdstock
> Cc: biojava-l at biojava.org
> Subject: Re: [Biojava-l] Editing a RichSequence[Scanned]
>
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> PS. The other workaround is to modify your local copy of BioJava, find
> the ChunkedSymbolList class, and change the 1<<14 CHUNK_SIZE limit to
> some higher value.
>
> Richard Holland wrote:
> > OK, got it.
> >
> > It's because ChunkedSymbolListFactory is creating a ChunkedSymbolList
> > for your sequence, because the sequence is greater than 1<<14 bp long
> > (that's about 16384 bytes). This is a hardcoded limit.
> >
> > ChunkedSymbolList extends AbstractSymbolList, which is immutable and
> > therefore not editable.
> >
> > I'm not sure who wrote ChunkedSymbolList - and I'm not sure how to (or
> > if I should) fix it. It's quite a deeply embedded piece of the system.
> >
> > Does anyone out there know?
> >
> > There is a workaround - create a new symbol list based on the
> > RichSequence ( SymbolList syms = new SimpleSymbolList(richSeq) ). The
> > copy will be mutable and edit() will work on it.
> >
> > cheers,
> > Richard
> >
> > Jolyon Holdstock wrote:
> >>> Hi,
> >>>
> >>> I tried using the readGenbank method with the following code...
> >>>
> >>> [code]
> >>> import java.io.BufferedReader;
> >>> import java.io.File;
> >>> import java.io.FileNotFoundException;
> >>> import java.io.FileReader;
> >>> import java.io.IOException;
> >>>
> >>> import org.biojava.bio.BioException;
> >>> import org.biojava.bio.symbol.Edit;
> >>> import org.biojava.bio.symbol.SymbolList;
> >>> import org.biojava.bio.seq.DNATools;
> >>> import org.biojava.bio.seq.io.SymbolTokenization;
> >>> import org.biojava.utils.ChangeVetoException;
> >>>
> >>> import org.biojavax.RichObjectFactory;
> >>> import org.biojavax.bio.seq.RichSequence;
> >>> import org.biojavax.bio.seq.io.RichSequenceBuilderFactory;
> >>>
> >>> public class EditBigSequence {
> >>>   RichSequence richSeq;
> >>>   Edit edit;
> >>>
> >>>   public EditBigSequence() {
> >>>     try {
> >>>       SymbolTokenization symbolTokenization =
> >>> DNATools.getDNA().getTokenization("token");
> >>>       richSeq = RichSequence.IOTools.readGenbank(new
> BufferedReader(new
> >>> FileReader(new File("AF234172.gbk"))),
> >>>                                                  symbolTokenization,
> >>>
> >>> RichSequenceBuilderFactory.FACTORY,
> >>>
> >>> RichObjectFactory.getDefaultNamespace()).nextRichSequence();
> >>>
> >>>       SymbolList insertSeq = DNATools.createDNA("AAAACCCCGGGGTTTT");
> >>>       edit = new Edit(1000, 100, insertSeq);
> >>>       richSeq.edit(edit);
> >>>     }
> >>>     catch (FileNotFoundException FNFE){
> >>>       System.out.println("FileNotFoundException: " + FNFE);
> >>>     }
> >>>     catch (BioException BIOE){
> >>>       System.out.println("BioException: " + BIOE);
> >>>     }
> >>>     catch (ChangeVetoException CVE){
> >>>       CVE.printStackTrace();
> >>>       System.out.println("ChangeVetoException: " + CVE);
> >>>     }
> >>>     catch (IOException IOE){
> >>>       System.out.println("IOException: " + IOE);
> >>>     }
> >>>   }
> >>>
> >>>   public static void main(String args []){
> >>>     EditBigSequence ebs = new EditBigSequence();
> >>>   }
> >>> }
> >>> [/code]
> >>>
> >>> But I still got an error, for which the StckTrace is below.
> >>>
> >>> org.biojava.utils.ChangeVetoException: AbstractSymbolList is
> immutable
> >>> ChangeVetoException: org.biojava.utils.ChangeVetoException:
> >>> AbstractSymbolList is immutable
> >>>         at
> >>>
> org.biojava.bio.symbol.AbstractSymbolList.edit(AbstractSymbolList.java:1
> >>> 13)
> >>>         at
> >>>
> org.biojavax.bio.seq.DummyRichSequenceHandler.edit(DummyRichSequenceHand
> >>> ler.java:30)
> >>>         at
> >>>
> org.biojavax.bio.seq.ThinRichSequence.edit(ThinRichSequence.java:155)
> >>>         at
> biojavahacks.EditBigSequence.<init>(EditBigSequence.java:47)
> >>>         at
> biojavahacks.EditBigSequence.main(EditBigSequence.java:65)
> >>>
> >>>
> >>> cheers,
> >>>
> >>> Jolyon
> >>>
> >>>
> >>> -----Original Message-----
> >>> From: Richard Holland [mailto:holland at ebi.ac.uk]
> >>> Sent: 15 February 2008 15:17
> >>> To: Jolyon Holdstock
> >>> Cc: biojava-l at biojava.org
> >>> Subject: Re: [Biojava-l] Editing a RichSequence[Scanned]
> >>>
> >>> I think it's because sequences are constructed internally in a
> >>> ChunkedSymbolListFactory which compresses large sequences whereas
> small
> >>> sequences are stored as normal uncompressed ones. Compressed
> sequences
> >>> extend AbstractSymbolList, which is immutable (and therefore
> uneditable)
> >>> whereas uncompressed ones do not, and hence are editable.
> >>>
> >>> You can disable the use of compressed sequences by using
> readGenbank()
> >>> instead of readGenbankDNA() and passing in the DNA alphabet and the
> >>> non-compressed sequence factory (see the static constants in
> >>> RichSequenceBuilderFactory).
> >>>
> >>> If this still doesn't work, please could you post the full
> stacktrace so
> >>> that we can see which class is throwing the exception and at what
> line
> >>> etc.
> >>>
> >>> cheers,
> >>> Richard
> >>>
> >>> On Fri, February 15, 2008 2:44 pm, Jolyon Holdstock wrote:
> >>>> Hi
> >>>>
> >>>>
> >>>> Hi,
> >>>>
> >>>> I am trying to edit a Genbank sequence.
> >>>> The code I'm using is as follows:
> >>>>
> >>>> [code]
> >>>> richSeq = RichSequence.IOTools.readGenbankDNA(new
> BufferedReader(new
> >>>> FileReader(new File("U00096.gbk"))), null).nextRichSequence();
> >>>>
> >>>> SymbolList sl1 = DNATools.createDNA("AAAGGGTTTCCC");
> >>>> Edit editOne = new Edit(47078, 2690, sl1);
> >>>> richSeq.edit(editOne);
> >>>>
> >>>> [/code]
> >>>>
> >>>> When it runs it gives the following error
> >>>>
> >>>> ChangeVetoException: org.biojava.utils.ChangeVetoException:
> >>>> AbstractSymbolList is immutable
> >>>>
> >>>>
> >>>> I have used the code for a smaller sequence (15kb, compared with
> 4Mb)
> >>>> and it works.
> >>>>
> >>>> Does anyone have an idea why this is not working?
> >>>>
> >>>> Thanks,
> >>>>
> >>>> Jolyon
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>> Jolyon Holdstock Ph.D.
> >>>> Senior Computational Biologist,
> >>>> Oxford Gene Technology,
> >>>> Begbroke Science Park,
> >>>> Sandy Lane, Yarnton
> >>>> Oxford, OX5 1PF
> >>>>
> >>>> Tel: +44 (0)1865 856852
> >>>> Fax: +44 (0)1865 842116
> >>>>
> >>>> Oxford Gene Technology (Operations) Ltd. Registered in England
> >>>> No:03845432 Begbroke Science Park, Sandy Lane, Yarnton, Oxford, OX5
> >>> 1PF.
> >>>> Confidentiality Notice: The contents of this email from the Oxford
> >>> Gene
> >>>> Technology Group of Companies are confidential and intended solely
> for
> >>>> the person to whom it is addressed. It may contain privileged and
> >>>> confidential information. If you are not the intended recipient you
> >>> must
> >>>> not read, copy, distribute, discuss or take any action in reliance
> on
> >>>> it.
> >>>>
> >>>>
> >>>> _______________________________________________
> >>>> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
> >>>> http://lists.open-bio.org/mailman/listinfo/biojava-l
> >>>>
> >>>
> >
> > --
> > Richard Holland (BioMart)
> > EMBL EBI, Wellcome Trust Genome Campus,
> > Hinxton, Cambridgeshire CB10 1SD, UK
> > Tel. +44 (0)1223 494416
> >
> > http://www.biomart.org/
> > http://www.biojava.org/
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-l
>
> - --
> Richard Holland (BioMart)
> EMBL EBI, Wellcome Trust Genome Campus,
> Hinxton, Cambridgeshire CB10 1SD, UK
> Tel. +44 (0)1223 494416
>
> http://www.biomart.org/
> http://www.biojava.org/
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.2.2 (GNU/Linux)
> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
>
> iD8DBQFHubA+4C5LeMEKA/QRAnaXAJ9qec6JaBIAroziiOYOM+NUIsQGHQCghT9P
> zOsc+G843TiPRPGw8YaSG3Q=
> =O/UX
> -----END PGP SIGNATURE-----
>
>
>
>
>
>
>
>
>
>
>
>
> This email has been scanned by Oxford Gene Technology Security Systems.
>
>
>
>
>
>
>
>
>
>
>
> This email has been scanned by Oxford Gene Technology Security Systems.
> _______________________________________________
>
>
>
> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-l
>



More information about the Biojava-l mailing list