[Biojava-l] Re: Contents of Biojava-l digest...

Sakellaris gsake@cs.uoi.gr
Sat, 27 Jul 2002 23:24:27 +0300


----- Original Message -----
From: <biojava-l-request@biojava.org>
To: <biojava-l@biojava.org>
Sent: Saturday, 27 July, 2002 7:00 G.Sake
Subject: Biojava-l digest, Vol 1 #715 - 3 msgs


> Send Biojava-l mailing list submissions to
> biojava-l@biojava.org
>
> To subscribe or unsubscribe via the World Wide Web, visit
> http://biojava.org/mailman/listinfo/biojava-l
> or, via email, send a message with subject or body 'help' to
> biojava-l-request@biojava.org
>
> You can reach the person managing the list at
> biojava-l-admin@biojava.org
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of Biojava-l digest..."
>
>
> Today's Topics:
>
>    1. SeqIOTools.readXXXXFields() method?? (Roy Park)
>    2. Re: adding toSequenceIterator method for Alignment
(=?UTF-8?B?S2FsbGUgTsOkc2x1bmQ=?=)
>    3. Re: SeqIOTools.readXXXXFields() method?? (Mark Fortner)
>
> --__--__--
>
> Message: 1
> From: Roy Park <RPark@lexgen.com>
> To: "'biojava-l@biojava.org'" <biojava-l@biojava.org>
> Date: Fri, 26 Jul 2002 11:16:06 -0500
> Subject: [Biojava-l] SeqIOTools.readXXXXFields() method??
>
> Hello everyone.
>
> I deal with a number of pseudo EMBL/GenBank formatted sequences, and it
> would be extremely nice (?) to have methods that only attempt to parse out
> specified fields.
>
> The primary reason for this is that, right now, the format.readSequence()
> throws BioException way too frequently for my purpose - i.e. although I
only
> need the fields X, Y and Z from each sequence definition, the
readSequence()
> throws exception where it finds the field W to be mal-formed, etc.
>
> I see that modified versions of the StreamReader class, the SequenceFormat
> implementing classes, etc. has to be written, which I can do.  I'm
wondering
> if anyone could suggest a preferred way of passing the desired fields to
be
> read.
>
> readXXXXFields(BufferedReader _br, ArrayList(of String)
_fieldsToBeParsed)..
> or
> readXXXXFields(BufferedReader _br, String[] _fieldsToBeParsed)..etc.
>
> (I think the readXXXXX(BufferedReader) should be called if the second
> argument is null.)
>
> Any input would be greatly appreciated.  (what about the naming of the
> methods - readXXXXPartial()??)
>
> Roy K. Park
> Bioinformatics Data Analyst
> Lexicon Genetics Incorporated
>
>
>
>
***************************************************************************
>  The contents of this communication are intended only for the addressee
and
> may contain confidential and/or privileged material. If you are not the
> intended recipient, please do not read, copy, use or disclose this
> communication and notify the sender.  Opinions, conclusions and other
> information in this communication that do not relate to the official
> business of my company shall be understood as neither given nor endorsed
by
> it.
>
***************************************************************************
>
>
>
> --__--__--
>
> Message: 2
> Date: Fri, 26 Jul 2002 20:52:56 +0200
> From: =?UTF-8?B?S2FsbGUgTsOkc2x1bmQ=?= <kalle.naslund@genpat.uu.se>
> To: "Singh, Nimesh" <Nimesh.Singh@maxygen.com>
> Cc: biojava-l@biojava.org
> Subject: Re: [Biojava-l] adding toSequenceIterator method for Alignment
>
> Singh, Nimesh wrote:
>
> >     I've created a class called AlignmentSequenceIterator that I intend
to put in the org.biojava.bio.seq package.  It will do the real work.  I've
also added
> >        public SequenceIterator sequenceIterator() {
> >            return new AlignmentSequenceIterator(this);
> >        }
> >to each alignment class.  It should work fine in every alignment, because
AlignmentSequenceIterator uses the getLabels and symbolListForLabel methods
from the Alignment interface.
> >
> >     If this is fine, then I'll upload everything later today.  If you
have any suggestions for changes, then let me know.
> >
> >Nimesh
> >
> >
>
> Well, there is one big problem with this piece of code, you treat all
> objects in the alignment as being
> SymbolLists only, witch in reality isnt true, as you can insert any
> object that implements the
> SymbolList interface into an alignment.
>
> For example, i am currently populating my alignment objects with custom
> Sequence objects. if i called
> this code it would create new Sequence objects of the type
> SimpleSequence, and as i understand it from
> a quick look at the SequenceFactory code, it wont have any of the
> annotations, features etc that the
> original Sequence objects i added to the alignment had. so, instead of
> geting my custom Sequence objects
> back containing feature etc, i would get nearly "empty" SimpleSequence
> objects back, witch makes it unusable.
> Other problems should pop upp if you insert other objects into
> Alignments, say other alignments. and instead
> of getting them back as alignments when you iterate over the SymbolLists
> in the alignemnt, you get it back
> as a SimpleSequence.
>
> But, i do agree that adding a method to the Alignment interface, that
> gives you an iterator so you can
> iterate over the SymbolList's in the alignment is a good thing to add.
>
> My suggestion would just be to have it iterate over the SymbolLists that
> are inserted into the Alignment
> and avoid doing any type of alterations of the objects. That way you get
> back what you insert, and
> the method will work for everyone, just not people using SimpleSequences.
>
> regads Kalle
>
> >
> >Here is the cod for AlignmentSequenceIterator:
> >
> >public class AlignmentSequenceIterator implements SequenceIterator {
> >    private Alignment align;
> >    private Iterator labels;
> >    private SequenceFactory sf;
> >    public AlignmentSequenceIterator(Alignment align) {
> >        this.align = align;
> >        labels = align.getLabels().iterator();
> >        sf = new SimpleSequenceFactory();
> >    }
> >    public boolean hasNext() {
> >        return labels.hasNext();
> >    }
> >    public Sequence nextSequence() throws NoSuchElementException,
BioException {
> >        if (!hasNext()) {
> >            throw new NoSuchElementException("No more sequences in the
alignment.");
> >        }
> >        else {
> >            try {
> >                Object label = labels.next();
> >                SymbolList symList = align.symbolListForLabel(label);
> >                Sequence seq = sf.createSequence(symList,
label.toString(), label.toString(), null);
> >                return seq;
> >            } catch (Exception e) {
> >         throw new BioException(e, "Could not read sequence");
> >     }
> >        }
> >    }
> >}
> >_______________________________________________
> >Biojava-l mailing list  -  Biojava-l@biojava.org
> >http://biojava.org/mailman/listinfo/biojava-l
> >
> >
>
>
>
>
> --__--__--
>
> Message: 3
> Date: Fri, 26 Jul 2002 22:26:30 -0500
> From: Mark Fortner <phidias@mindspring.com>
> To: biojava-l@biojava.org
> Subject: Re: [Biojava-l] SeqIOTools.readXXXXFields() method??
>
> I wonder if it would be worthwhile to have an alphabet like approach to
> this, where the alphabets are actually field tokens/field names that are
> either statically defined, or are defined in XML files?  For example,
> you might have entries like
> <field-list name="swissprot">
>     <field name="accession" token="AC"/>
>     <field name="id" token="ID"/>
>     ....
> </field-list>
>
> You could save subsets of these field lists (alphabets) and pass the
> file name  your code at run-time.  If you want more separation of the
> layers of your code you could keep the file handling code in another
> class, and simply accept an ArrayList of Field objects as the parameter
> to your method.
>
> Mark
>
> Roy Park wrote:
>
> >Hello everyone.
> >
> >I deal with a number of pseudo EMBL/GenBank formatted sequences, and it
> >would be extremely nice (?) to have methods that only attempt to parse
out
> >specified fields.
> >
> >The primary reason for this is that, right now, the format.readSequence()
> >throws BioException way too frequently for my purpose - i.e. although I
only
> >need the fields X, Y and Z from each sequence definition, the
readSequence()
> >throws exception where it finds the field W to be mal-formed, etc.
> >
> >I see that modified versions of the StreamReader class, the
SequenceFormat
> >implementing classes, etc. has to be written, which I can do.  I'm
wondering
> >if anyone could suggest a preferred way of passing the desired fields to
be
> >read.
> >
> >readXXXXFields(BufferedReader _br, ArrayList(of String)
_fieldsToBeParsed)..
> >or
> >readXXXXFields(BufferedReader _br, String[] _fieldsToBeParsed)..etc.
> >
> >(I think the readXXXXX(BufferedReader) should be called if the second
> >argument is null.)
> >
> >Any input would be greatly appreciated.  (what about the naming of the
> >methods - readXXXXPartial()??)
> >
> >Roy K. Park
> >Bioinformatics Data Analyst
> >Lexicon Genetics Incorporated
> >
> >
> >
>
>***************************************************************************
> > The contents of this communication are intended only for the addressee
and
> >may contain confidential and/or privileged material. If you are not the
> >intended recipient, please do not read, copy, use or disclose this
> >communication and notify the sender.  Opinions, conclusions and other
> >information in this communication that do not relate to the official
> >business of my company shall be understood as neither given nor endorsed
by
> >it.
>
>***************************************************************************
> >
> >
> >_______________________________________________
> >Biojava-l mailing list  -  Biojava-l@biojava.org
> >http://biojava.org/mailman/listinfo/biojava-l
> >
> >
> >
>
>
>
>
>
> --__--__--
>
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l@biojava.org
> http://biojava.org/mailman/listinfo/biojava-l
>
>
> End of Biojava-l Digest