[Biojava-l] SeqIOTools.readXXXXFields() method??

Mark Fortner phidias@mindspring.com
Fri, 26 Jul 2002 22:26:30 -0500


I wonder if it would be worthwhile to have an alphabet like approach to 
this, where the alphabets are actually field tokens/field names that are 
either statically defined, or are defined in XML files?  For example, 
you might have entries like
<field-list name="swissprot">
    <field name="accession" token="AC"/>
    <field name="id" token="ID"/>
    ....
</field-list>

You could save subsets of these field lists (alphabets) and pass the 
file name  your code at run-time.  If you want more separation of the 
layers of your code you could keep the file handling code in another 
class, and simply accept an ArrayList of Field objects as the parameter 
to your method.

Mark

Roy Park wrote:

>Hello everyone.
>
>I deal with a number of pseudo EMBL/GenBank formatted sequences, and it
>would be extremely nice (?) to have methods that only attempt to parse out
>specified fields.
>
>The primary reason for this is that, right now, the format.readSequence()
>throws BioException way too frequently for my purpose - i.e. although I only
>need the fields X, Y and Z from each sequence definition, the readSequence()
>throws exception where it finds the field W to be mal-formed, etc.
>
>I see that modified versions of the StreamReader class, the SequenceFormat
>implementing classes, etc. has to be written, which I can do.  I'm wondering
>if anyone could suggest a preferred way of passing the desired fields to be
>read.
>
>readXXXXFields(BufferedReader _br, ArrayList(of String) _fieldsToBeParsed)..
>or
>readXXXXFields(BufferedReader _br, String[] _fieldsToBeParsed)..etc.
>
>(I think the readXXXXX(BufferedReader) should be called if the second
>argument is null.)
>
>Any input would be greatly appreciated.  (what about the naming of the
>methods - readXXXXPartial()??)
>
>Roy K. Park
>Bioinformatics Data Analyst
>Lexicon Genetics Incorporated
>
>
>
>*************************************************************************** 
> The contents of this communication are intended only for the addressee and
>may contain confidential and/or privileged material. If you are not the
>intended recipient, please do not read, copy, use or disclose this
>communication and notify the sender.  Opinions, conclusions and other
>information in this communication that do not relate to the official
>business of my company shall be understood as neither given nor endorsed by
>it.  
>*************************************************************************** 
>
>
>_______________________________________________
>Biojava-l mailing list  -  Biojava-l@biojava.org
>http://biojava.org/mailman/listinfo/biojava-l
>
>  
>