[Biojava-l] Editing a RichSequence[Scanned]
Jolyon Holdstock
jolyon.holdstock at ogt.co.uk
Tue Feb 19 12:12:23 UTC 2008
Hi,
Thanks for the workaround, better than me using a StringBuffer to do it.
The problem with either is that I want to load a Genbank file, insert
some sequence, adjust the positions of affected features and then output
the RichSequence in Genbank format.
If I make a copy of the SymbolList I won't be able output the adjusted
sequence with the features etc... as a Genbank file.
I can do it in 2 steps via copy and pasting from the files produced. I
just wondered if it is possible to do it with BioJava using a single
step.
Cheers,
Jolyon
-----Original Message-----
From: Richard Holland [mailto:holland at ebi.ac.uk]
Sent: 18 February 2008 16:20
To: Jolyon Holdstock
Cc: biojava-l at biojava.org
Subject: Re: [Biojava-l] Editing a RichSequence[Scanned]
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
PS. The other workaround is to modify your local copy of BioJava, find
the ChunkedSymbolList class, and change the 1<<14 CHUNK_SIZE limit to
some higher value.
Richard Holland wrote:
> OK, got it.
>
> It's because ChunkedSymbolListFactory is creating a ChunkedSymbolList
> for your sequence, because the sequence is greater than 1<<14 bp long
> (that's about 16384 bytes). This is a hardcoded limit.
>
> ChunkedSymbolList extends AbstractSymbolList, which is immutable and
> therefore not editable.
>
> I'm not sure who wrote ChunkedSymbolList - and I'm not sure how to (or
> if I should) fix it. It's quite a deeply embedded piece of the system.
>
> Does anyone out there know?
>
> There is a workaround - create a new symbol list based on the
> RichSequence ( SymbolList syms = new SimpleSymbolList(richSeq) ). The
> copy will be mutable and edit() will work on it.
>
> cheers,
> Richard
>
> Jolyon Holdstock wrote:
>>> Hi,
>>>
>>> I tried using the readGenbank method with the following code...
>>>
>>> [code]
>>> import java.io.BufferedReader;
>>> import java.io.File;
>>> import java.io.FileNotFoundException;
>>> import java.io.FileReader;
>>> import java.io.IOException;
>>>
>>> import org.biojava.bio.BioException;
>>> import org.biojava.bio.symbol.Edit;
>>> import org.biojava.bio.symbol.SymbolList;
>>> import org.biojava.bio.seq.DNATools;
>>> import org.biojava.bio.seq.io.SymbolTokenization;
>>> import org.biojava.utils.ChangeVetoException;
>>>
>>> import org.biojavax.RichObjectFactory;
>>> import org.biojavax.bio.seq.RichSequence;
>>> import org.biojavax.bio.seq.io.RichSequenceBuilderFactory;
>>>
>>> public class EditBigSequence {
>>> RichSequence richSeq;
>>> Edit edit;
>>>
>>> public EditBigSequence() {
>>> try {
>>> SymbolTokenization symbolTokenization =
>>> DNATools.getDNA().getTokenization("token");
>>> richSeq = RichSequence.IOTools.readGenbank(new
BufferedReader(new
>>> FileReader(new File("AF234172.gbk"))),
>>> symbolTokenization,
>>>
>>> RichSequenceBuilderFactory.FACTORY,
>>>
>>> RichObjectFactory.getDefaultNamespace()).nextRichSequence();
>>>
>>> SymbolList insertSeq = DNATools.createDNA("AAAACCCCGGGGTTTT");
>>> edit = new Edit(1000, 100, insertSeq);
>>> richSeq.edit(edit);
>>> }
>>> catch (FileNotFoundException FNFE){
>>> System.out.println("FileNotFoundException: " + FNFE);
>>> }
>>> catch (BioException BIOE){
>>> System.out.println("BioException: " + BIOE);
>>> }
>>> catch (ChangeVetoException CVE){
>>> CVE.printStackTrace();
>>> System.out.println("ChangeVetoException: " + CVE);
>>> }
>>> catch (IOException IOE){
>>> System.out.println("IOException: " + IOE);
>>> }
>>> }
>>>
>>> public static void main(String args []){
>>> EditBigSequence ebs = new EditBigSequence();
>>> }
>>> }
>>> [/code]
>>>
>>> But I still got an error, for which the StckTrace is below.
>>>
>>> org.biojava.utils.ChangeVetoException: AbstractSymbolList is
immutable
>>> ChangeVetoException: org.biojava.utils.ChangeVetoException:
>>> AbstractSymbolList is immutable
>>> at
>>>
org.biojava.bio.symbol.AbstractSymbolList.edit(AbstractSymbolList.java:1
>>> 13)
>>> at
>>>
org.biojavax.bio.seq.DummyRichSequenceHandler.edit(DummyRichSequenceHand
>>> ler.java:30)
>>> at
>>>
org.biojavax.bio.seq.ThinRichSequence.edit(ThinRichSequence.java:155)
>>> at
biojavahacks.EditBigSequence.<init>(EditBigSequence.java:47)
>>> at
biojavahacks.EditBigSequence.main(EditBigSequence.java:65)
>>>
>>>
>>> cheers,
>>>
>>> Jolyon
>>>
>>>
>>> -----Original Message-----
>>> From: Richard Holland [mailto:holland at ebi.ac.uk]
>>> Sent: 15 February 2008 15:17
>>> To: Jolyon Holdstock
>>> Cc: biojava-l at biojava.org
>>> Subject: Re: [Biojava-l] Editing a RichSequence[Scanned]
>>>
>>> I think it's because sequences are constructed internally in a
>>> ChunkedSymbolListFactory which compresses large sequences whereas
small
>>> sequences are stored as normal uncompressed ones. Compressed
sequences
>>> extend AbstractSymbolList, which is immutable (and therefore
uneditable)
>>> whereas uncompressed ones do not, and hence are editable.
>>>
>>> You can disable the use of compressed sequences by using
readGenbank()
>>> instead of readGenbankDNA() and passing in the DNA alphabet and the
>>> non-compressed sequence factory (see the static constants in
>>> RichSequenceBuilderFactory).
>>>
>>> If this still doesn't work, please could you post the full
stacktrace so
>>> that we can see which class is throwing the exception and at what
line
>>> etc.
>>>
>>> cheers,
>>> Richard
>>>
>>> On Fri, February 15, 2008 2:44 pm, Jolyon Holdstock wrote:
>>>> Hi
>>>>
>>>>
>>>> Hi,
>>>>
>>>> I am trying to edit a Genbank sequence.
>>>> The code I'm using is as follows:
>>>>
>>>> [code]
>>>> richSeq = RichSequence.IOTools.readGenbankDNA(new
BufferedReader(new
>>>> FileReader(new File("U00096.gbk"))), null).nextRichSequence();
>>>>
>>>> SymbolList sl1 = DNATools.createDNA("AAAGGGTTTCCC");
>>>> Edit editOne = new Edit(47078, 2690, sl1);
>>>> richSeq.edit(editOne);
>>>>
>>>> [/code]
>>>>
>>>> When it runs it gives the following error
>>>>
>>>> ChangeVetoException: org.biojava.utils.ChangeVetoException:
>>>> AbstractSymbolList is immutable
>>>>
>>>>
>>>> I have used the code for a smaller sequence (15kb, compared with
4Mb)
>>>> and it works.
>>>>
>>>> Does anyone have an idea why this is not working?
>>>>
>>>> Thanks,
>>>>
>>>> Jolyon
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> Jolyon Holdstock Ph.D.
>>>> Senior Computational Biologist,
>>>> Oxford Gene Technology,
>>>> Begbroke Science Park,
>>>> Sandy Lane, Yarnton
>>>> Oxford, OX5 1PF
>>>>
>>>> Tel: +44 (0)1865 856852
>>>> Fax: +44 (0)1865 842116
>>>>
>>>> Oxford Gene Technology (Operations) Ltd. Registered in England
>>>> No:03845432 Begbroke Science Park, Sandy Lane, Yarnton, Oxford, OX5
>>> 1PF.
>>>> Confidentiality Notice: The contents of this email from the Oxford
>>> Gene
>>>> Technology Group of Companies are confidential and intended solely
for
>>>> the person to whom it is addressed. It may contain privileged and
>>>> confidential information. If you are not the intended recipient you
>>> must
>>>> not read, copy, distribute, discuss or take any action in reliance
on
>>>> it.
>>>>
>>>>
>>>> _______________________________________________
>>>> Biojava-l mailing list - Biojava-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/biojava-l
>>>>
>>>
>
> --
> Richard Holland (BioMart)
> EMBL EBI, Wellcome Trust Genome Campus,
> Hinxton, Cambridgeshire CB10 1SD, UK
> Tel. +44 (0)1223 494416
>
> http://www.biomart.org/
> http://www.biojava.org/
_______________________________________________
Biojava-l mailing list - Biojava-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/biojava-l
- --
Richard Holland (BioMart)
EMBL EBI, Wellcome Trust Genome Campus,
Hinxton, Cambridgeshire CB10 1SD, UK
Tel. +44 (0)1223 494416
http://www.biomart.org/
http://www.biojava.org/
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2.2 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
iD8DBQFHubA+4C5LeMEKA/QRAnaXAJ9qec6JaBIAroziiOYOM+NUIsQGHQCghT9P
zOsc+G843TiPRPGw8YaSG3Q=
=O/UX
-----END PGP SIGNATURE-----
This email has been scanned by Oxford Gene Technology Security Systems.
This email has been scanned by Oxford Gene Technology Security Systems.
More information about the Biojava-l
mailing list