[Biojava-l] SeqIOTools.readEmblNucleotide()

Schreiber, Mark mark.schreiber@agresearch.co.nz
Wed, 24 Jul 2002 10:46:36 +1200


> Hi Mark, how's things in NZ?
>

Not too bad, hows Texas?
 
> What you described was exactly what I've done locally as a 
> quick hack for the solution.  For the time being, it will 
> have to do.  The target database for parsing is not the 
> regular EMBL sequences, but the Derwent GENESEQ patented 
> nucleotides (derwent.co.uk), which seem to include both DNA 
> and RNA sequences.  I imagine there are a number of 
> pseudo-EMBL/GenBank out there - so something like 
> readEmblNucleotide() could be pretty useful.
> 

You could look at overiding readEmbl to try and guess the alphabet.
Guessing is always a bit uncertain though. I guess you would assume dna
unless there was a U character.

> I was thinking that it would've been a bit more flexible to 
> have implemented a more generic version of .readXXXX(), then 
> .readGenbank/Swissprot/Embl() as sub-classes where you 
> override the parent methods to be more specific in filtering 
> criteria, as well as having an option to selectively load 
> features and other properties.  Or it would've been equally 
> cool to have a generic
> .readXXXX() then being able to parse many GenBank/EMBL like 
> files using different XML configuration files (my application 
> uses this approach to filter out unwanted sequences).
> 
> Thanks for your input.  Looks like I'll be checking in that 
> .xxRNAxx() methods after all.
> 

There are some more generic fileToBiojava methods in SeqIOTools now
which might be more akin to what you are talking about. The good thing
about SeqIOTools is that its all just static methods. If you can think
of a better way then you could put them in SeqIOTools or make a
SeqIOTools2 or RoysIOTools or something similar.

- Mark



> Roy
> 
> 
> -----Original Message-----
> From: Schreiber, Mark [mailto:mark.schreiber@agresearch.co.nz]
> Sent: Tuesday, July 23, 2002 5:04 PM
> To: Roy Park
> Subject: RE: [Biojava-l] SeqIOTools.readEmblNucleotide()
> 
> 
> I wasn't aware that EMBL contained sequences in the RNA 
> alphabet however, to soleve the problem  would involve some 
> copying and modifying of the existing methods readEmbl and 
> getDNAParser 
> 
> If you copied readEmbl and named the new method readEmblRNA 
> and change the line getDNAParser to getRNAParser (see below). 
> Basically IO follows the strategy that you require a format a 
> tokenizer and a factory. In this case only the tokenizer 
> would be different.
> 
>    /**
>      * Iterate over the sequences in an EMBL-format stream.
>      */
>     public static SequenceIterator readEmblRNA(BufferedReader br) {
>         return new StreamReader(br,
>                                 new EmblLikeFormat(),
>                                 getRNAParser(),
>                                 getEmblBuilderFactory());
>     }
> 
> 
> 
> Now you actually need to make the method getRNAParser() by copying
> getDNAParser() renaming it and changing the return 
> DNATools.getDNA().getTokenization("token"); statement to give 
> the following. I haen't tried compiling this but it should work.
> 
> 
>     private static SymbolTokenization getRNAParser() {
>         try {
>             return RNATools.getRNA().getTokenization("token");
>         } catch (BioException ex) {
>             throw new BioError(ex, "Assertion failing: 
> Couldn't get RNA token parser");
>         }
>     }
> 
> > -----Original Message-----
> > From: Roy Park [mailto:RPark@lexgen.com]
> > Sent: Wednesday, 24 July 2002 7:31 a.m.
> > To: 'biojava-l@biojava.org'
> > Subject: [Biojava-l] SeqIOTools.readEmblNucleotide()
> > 
> > 
> > As I look at the SeqIOTools.readXXXX() methods, I see that
> > the readEmbl() is coded to work only with DNA Alphabets and 
> > not RNA Alphabets at all.  I am in need of a more generic 
> > readEmbl() for nucleotides - i.e. including RNA.  How do you 
> > suggest on solving this?  Give me some suggestions and 
> > directions and I'll be glad to write it.  Thanks.
> > 
> > Roy Park
> > 
> > 
> > 
> > **************************************************************
> > *************
> >  The contents of this communication are intended only for the 
> > addressee and may contain confidential and/or privileged 
> > material. If you are not the intended recipient, please do 
> > not read, copy, use or disclose this communication and notify 
> > the sender.  Opinions, conclusions and other information in 
> > this communication that do not relate to the official 
> > business of my company shall be understood as neither given 
> > nor endorsed by it.  
> > **************************************************************
> > ************* 
> > 
> > 
> > _______________________________________________
> > Biojava-l mailing list  -  Biojava-l@biojava.org
> > http://biojava.org/mailman/listinfo/biojava-l
> > 
> ==============================================================
> =========
> Attention: The information contained in this message and/or 
> attachments from AgResearch Limited is intended only for the 
> persons or entities to which it is addressed and may contain 
> confidential and/or privileged material. Any review, 
> retransmission, dissemination or other use of, or taking of 
> any action in reliance upon, this information by persons or 
> entities other than the intended recipients is prohibited by 
> AgResearch Limited. If you have received this message in 
> error, please notify the sender immediately. 
> ==============================================================
> =========
> 
> 
> **************************************************************
> ************* 
>  The contents of this communication are intended only for the 
> addressee and may contain confidential and/or privileged 
> material. If you are not the intended recipient, please do 
> not read, copy, use or disclose this communication and notify 
> the sender.  Opinions, conclusions and other information in 
> this communication that do not relate to the official 
> business of my company shall be understood as neither given 
> nor endorsed by it.  
> **************************************************************
> ************* 
> 
> 
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l@biojava.org 
> http://biojava.org/mailman/listinfo/biojava-l
> 
=======================================================================
Attention: The information contained in this message and/or attachments
from AgResearch Limited is intended only for the persons or entities
to which it is addressed and may contain confidential and/or privileged
material. Any review, retransmission, dissemination or other use of, or
taking of any action in reliance upon, this information by persons or
entities other than the intended recipients is prohibited by AgResearch
Limited. If you have received this message in error, please notify the
sender immediately.
=======================================================================