[Biojava-l] SeqIOTools.readEmblNucleotide()

Roy Park RPark@lexgen.com
Tue, 23 Jul 2002 17:24:03 -0500


Hi Mark, how's things in NZ?

What you described was exactly what I've done locally as a quick hack for
the solution.  For the time being, it will have to do.  The target database
for parsing is not the regular EMBL sequences, but the Derwent GENESEQ
patented nucleotides (derwent.co.uk), which seem to include both DNA and RNA
sequences.  I imagine there are a number of pseudo-EMBL/GenBank out there -
so something like readEmblNucleotide() could be pretty useful.

I was thinking that it would've been a bit more flexible to have implemented
a more generic version of .readXXXX(), then .readGenbank/Swissprot/Embl() as
sub-classes where you override the parent methods to be more specific in
filtering criteria, as well as having an option to selectively load features
and other properties.  Or it would've been equally cool to have a generic
.readXXXX() then being able to parse many GenBank/EMBL like files using
different XML configuration files (my application uses this approach to
filter out unwanted sequences).

Thanks for your input.  Looks like I'll be checking in that .xxRNAxx()
methods after all.

Roy


-----Original Message-----
From: Schreiber, Mark [mailto:mark.schreiber@agresearch.co.nz]
Sent: Tuesday, July 23, 2002 5:04 PM
To: Roy Park
Subject: RE: [Biojava-l] SeqIOTools.readEmblNucleotide()


I wasn't aware that EMBL contained sequences in the RNA alphabet
however, to soleve the problem  would involve some copying and modifying
of the existing methods readEmbl and getDNAParser 

If you copied readEmbl and named the new method readEmblRNA and change
the line getDNAParser to getRNAParser (see below). Basically IO follows
the strategy that you require a format a tokenizer and a factory. In
this case only the tokenizer would be different.

   /**
     * Iterate over the sequences in an EMBL-format stream.
     */
    public static SequenceIterator readEmblRNA(BufferedReader br) {
        return new StreamReader(br,
                                new EmblLikeFormat(),
                                getRNAParser(),
                                getEmblBuilderFactory());
    }



Now you actually need to make the method getRNAParser() by copying
getDNAParser() renaming it and changing the return
DNATools.getDNA().getTokenization("token"); statement to give the
following. I haen't tried compiling this but it should work.


    private static SymbolTokenization getRNAParser() {
        try {
            return RNATools.getRNA().getTokenization("token");
        } catch (BioException ex) {
            throw new BioError(ex, "Assertion failing: Couldn't get RNA
token parser");
        }
    }

> -----Original Message-----
> From: Roy Park [mailto:RPark@lexgen.com] 
> Sent: Wednesday, 24 July 2002 7:31 a.m.
> To: 'biojava-l@biojava.org'
> Subject: [Biojava-l] SeqIOTools.readEmblNucleotide()
> 
> 
> As I look at the SeqIOTools.readXXXX() methods, I see that 
> the readEmbl() is coded to work only with DNA Alphabets and 
> not RNA Alphabets at all.  I am in need of a more generic 
> readEmbl() for nucleotides - i.e. including RNA.  How do you 
> suggest on solving this?  Give me some suggestions and 
> directions and I'll be glad to write it.  Thanks.
> 
> Roy Park
> 
> 
> 
> **************************************************************
> ************* 
>  The contents of this communication are intended only for the 
> addressee and may contain confidential and/or privileged 
> material. If you are not the intended recipient, please do 
> not read, copy, use or disclose this communication and notify 
> the sender.  Opinions, conclusions and other information in 
> this communication that do not relate to the official 
> business of my company shall be understood as neither given 
> nor endorsed by it.  
> **************************************************************
> ************* 
> 
> 
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l@biojava.org 
> http://biojava.org/mailman/listinfo/biojava-l
> 
=======================================================================
Attention: The information contained in this message and/or attachments
from AgResearch Limited is intended only for the persons or entities
to which it is addressed and may contain confidential and/or privileged
material. Any review, retransmission, dissemination or other use of, or
taking of any action in reliance upon, this information by persons or
entities other than the intended recipients is prohibited by AgResearch
Limited. If you have received this message in error, please notify the
sender immediately.
=======================================================================


*************************************************************************** 
 The contents of this communication are intended only for the addressee and
may contain confidential and/or privileged material. If you are not the
intended recipient, please do not read, copy, use or disclose this
communication and notify the sender.  Opinions, conclusions and other
information in this communication that do not relate to the official
business of my company shall be understood as neither given nor endorsed by
it.  
***************************************************************************