[Biojava-l] Parsing circular sequences

Keith James kdj@sanger.ac.uk
Tue, 12 Nov 2002 10:04:18 GMT


From: Keith James <kdj@maul>
Date: 12 Nov 2002 10:04:18 +0000
In-Reply-To: <3DD02EBD.6070700@yahoo.co.uk>
Message-ID: <sc48yzzqf31.fsf@maul.i-did-not-set--mail-host-address--so-shoot-me>
Lines: 28
User-Agent: Gnus/5.0808 (Gnus v5.8.8) Emacs/21.2
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii

>>>>> "Matthew" == Matthew Pocock <matthew_pocock@yahoo.co.uk> writes:

    Matthew> Is there any will to replace the current monolithic
    Matthew> parsers for embl/genbank/swissprot et.al. with modular
    Matthew> event-based parsers based upon tag-value? If we did this
    Matthew> then the location parsing module can just listen for
    Matthew> sequence length events. I realy have no idea how the
    Matthew> performance of the two aproaches would compare, but I'm
    Matthew> willing to help with writing the tag-value embl parser
    Matthew> and benchmarking the result.

I started on a hybrid EMBL parser which combined tag-value and
JFlex/CUP for the feature table, but gave it up for more interesting
things. (It was a real drag trying to get conflicts in the feature
table BNF to resolve and then there's the syntax errors in the DB
itself.)

I'd help with this. I'm messing with the same thing in Lisp, so it
would be an interesting excercise. (Dammit! I *swore* I'd never do
another EMBL parser!)

Keith

-- 

- Keith James <kdj@sanger.ac.uk> bioinformatics programming support -
- Pathogen Sequencing Unit, The Wellcome Trust Sanger Institute, UK -