[Biojava-l] SeqIOTools.readXXXXFields() method??

Matthew Pocock matthew_pocock@yahoo.co.uk
Sat, 27 Jul 2002 17:22:14 +0100 (BST)


Hi Roy,

In principle, it should be possible to plug together a
chain of parsers that just grab the fields you want
and ignore the rest. In practice, parsers for formats
like embl and genbank have been written in a more
monolithic manner. If you just want the field info and
don't necisarily want a blessed biojava sequence
object out the end, you should be able to knock
something up in a few minutes using
org.biojava.bio.program.tagvalue, but I don't have a
small demo script with me right now to show you how.

In the future, it would be nice to see if it is
possible to implement the sequence IO interfaces using
tagvalue parsers (perhaps a small adapter object).

I should realy write a tagvalue tutorial.

Matthew

 --- Roy Park <RPark@lexgen.com> wrote: > Hello
everyone.
> 
> I deal with a number of pseudo EMBL/GenBank
> formatted sequences, and it
> would be extremely nice (?) to have methods that
> only attempt to parse out
> specified fields.
> 
> The primary reason for this is that, right now, the
> format.readSequence()
> throws BioException way too frequently for my
> purpose - i.e. although I only
> need the fields X, Y and Z from each sequence
> definition, the readSequence()
> throws exception where it finds the field W to be
> mal-formed, etc.
> 
> I see that modified versions of the StreamReader
> class, the SequenceFormat
> implementing classes, etc. has to be written, which
> I can do.  I'm wondering
> if anyone could suggest a preferred way of passing
> the desired fields to be
> read.
> 
> readXXXXFields(BufferedReader _br, ArrayList(of
> String) _fieldsToBeParsed)..
> or
> readXXXXFields(BufferedReader _br, String[]
> _fieldsToBeParsed)..etc.
> 
> (I think the readXXXXX(BufferedReader) should be
> called if the second
> argument is null.)
> 
> Any input would be greatly appreciated.  (what about
> the naming of the
> methods - readXXXXPartial()??)
> 
> Roy K. Park
> Bioinformatics Data Analyst
> Lexicon Genetics Incorporated
> 
> 
> 
>
***************************************************************************
> 
>  The contents of this communication are intended
> only for the addressee and
> may contain confidential and/or privileged material.
> If you are not the
> intended recipient, please do not read, copy, use or
> disclose this
> communication and notify the sender.  Opinions,
> conclusions and other
> information in this communication that do not relate
> to the official
> business of my company shall be understood as
> neither given nor endorsed by
> it.  
>
***************************************************************************
> 
> 
> 
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l@biojava.org
> http://biojava.org/mailman/listinfo/biojava-l 

__________________________________________________
Do You Yahoo!?
Everything you'll ever need on one web page
from News and Sport to Email and Music Charts
http://uk.my.yahoo.com