[Biojava-dev] [Biojava-l] File parsing in BJ3

Richard Holland dicknetherlands at gmail.com
Tue Oct 21 11:14:29 UTC 2008


For now, yes it's empty. But I can envisage situations where it might be
nice to have Thing implement some common methods (e.g. isMachineGenerated(),
isManuallyCurated(), etc.). I'd rather have it there now to be a placeholder
for future expansion, than have to re-engineer everything should we identify
a need for common functions in future.

You'll see that Thing already extends Serializable, implying that all Things
must be able to persist to an object backing store. Serializable itself is
also an empty interface!

Also I like the idea of having Thing, not Object, as a kind of marker of
intention. To me it makes it clearer when reading code to avoid Object
wherever possible. Thing may not be any more clever than Object, but it
immediately declares an intention when reading code as to what kind of
Object should be expected.


2008/10/21 Mark Schreiber <markjschreiber at gmail.com>

> Is there any need for Thing at all? Can't a bulder be typed to produce
> something that extends Object?
>
> If Thing provides no behaivour contract or meta-information then why
> does it exist?
>
> - Mark
>
> On Tue, Oct 21, 2008 at 4:49 PM, Andy Yates <ayates at ebi.ac.uk> wrote:
> > Depends on what you want to program. If you want to have a collection of
> > objects which are Things & perform a common action on them then
> > annotations are not the way forward.
> >
> > If you want to have some kind of meta-programming occurring & need a
> > class to be multiple things then annotations are right. There is
> > currently no way to enforce compile time dependencies on annotations &
> > my thinking is that this is right. Annotations should be meta data or
> > provide a way to alter a class in a non-invasive way (think Web Service
> > annotations creating WS Servers & Clients without any alteration of the
> > class).
> >
> > Andy
> >
> > Richard Holland wrote:
> >> Spot on.
> >>
> >> Annotation/interface.... i think Annotation is probably better as you
> >> suggest, but I'd have to look into that. Not sure how it works with
> >> collections and generics. If it does turn out to be a better bet, I'll
> >> change it over.
> >>
> >> With the BioSQL dependencies, take a look at the pom.xml file inside the
> >> biojava-dna module. It declares a dependency on biojava-core. If you
> want to
> >> add dependencies to external JARs, take a look at biojava-biosql's
> pom.xml
> >> to see how it depends on javax.persistence. (The easiest way to add
> these is
> >> via an IDE such as NetBeans, which is what I'm using at the moment).
> >>
> >> cheers,
> >> Richard
> >>
> >> 2008/10/21 Mark Schreiber <markjschreiber at gmail.com>
> >>
> >>> So if I want to build a BioSQL loader from Genbank then would the
> >>> classes (or there wrappers) in the BioSQL Entity package need to
> >>> implement Thing?  Would maven have an issue with that or would it just
> >>> create a dependency on core? (you can tell I've never used Maven
> >>> right).
> >>>
> >>> From a design point of view should Thing be an interface or an
> >>> Annotation? The reason I ask is that it doesn't define any methods so
> >>> it is more of a tag than an interface.
> >>>
> >>> Anyway, my understanding is that I would use a Genbank parser (or
> >>> write one). Write a EntityReceiver interface (probably more than one
> >>> given the number of entities in BioSQL, implement a EntityBuilder
> >>> (again possibly more than one) that implements EntityReceiver and
> >>> builds Entity beans from messages it receives. In this case I probably
> >>> wouldn't provide a writer as JPA would be writing the beans to the
> >>> database.  Would this be how you imagine it?
> >>>
> >>> - Mark
> >>>
> >>>
> >>> On Tue, Oct 21, 2008 at 1:52 AM, Richard Holland
> >>> <holland at eaglegenomics.com> wrote:
> >>>> (From now on I will only be posting these development messages to
> >>>> biojava-dev, which is the intended purpose of that list. Those of you
> who
> >>>> wish to keep track of things but are currently only subscribed to
> >>> biojava-l
> >>>> should also subscribe to biojava-dev in order to keep up to date.)
> >>>>
> >>>> As promised, I've committed a new package in the biojava-core module
> that
> >>>> should help understand how to do file parsing and conversion and
> writing
> >>> in
> >>>> the new BJ3 modules. Here's an example of how to use it to write a
> >>> Genbank
> >>>> parser (note no parsers actually exist yet!):
> >>>>
> >>>> 1. Design yourself a Genbank class which implements the interface
> Thing
> >>> and
> >>>> can fully represent all the data that might possibly occur inside a
> >>> Genbank
> >>>> file.
> >>>>
> >>>> 2. Write an interface called GenbankReceiver, which extends
> ThingReceiver
> >>>> and defines all the methods you might need in order to construct a
> >>> Genbank
> >>>> object in an asynchronous fashion.
> >>>>
> >>>> 3. Write a GenbankBuilder class which implements GenbankReceiver and
> >>>> ThingBuilder. It's job is to receive data via method calls, use that
> data
> >>> to
> >>>> construct a Genbank object, then provide that object on demand.
> >>>>
> >>>> 4. Write a GenbankWriter class which implements GenbankReceiver and
> >>>> ThingWriter. It's job is similar to GenbankBuilder, but instead of
> >>>> constructing new Genbank objects, it writes Genbank records to file
> that
> >>>> reflect the data it receives.
> >>>>
> >>>> 5. Write a GenbankReader class which implements ThingReader. It can
> read
> >>>> GenbankFiles and output the data to the methods of the ThingReceiver
> >>>> provided to it, which in this case could be anything which implements
> the
> >>>> interface GenbankReceiver.
> >>>>
> >>>> 6. Write a GenbankEmitter class which implements ThingEmitter. It
> takes a
> >>>> Genbank object and will fire off data from it to the provided
> >>> ThingReceiver
> >>>> (a GenbankReceiver instance) as if the Genbank object was being read
> from
> >>> a
> >>>> file or some other source.
> >>>>
> >>>> That's it! OK so it's a minimum of 6 classes instead of the original 1
> or
> >>> 2,
> >>>> but the additional steps are necessary for flexibility in converting
> >>> between
> >>>> formats.
> >>>>
> >>>> Now to use it (you'll probably want a GenbankTools class to wrap these
> >>> steps
> >>>> up for user-friendliness, including various options for opening files,
> >>>> etc.):
> >>>>
> >>>> 1. To read a file - instantiate ThingParser with your GenbankReader as
> >>> the
> >>>> reader, and GenbankBuilder as the receiver. Use the iterator methods
> on
> >>>> ThingParser to get the objects out.
> >>>>
> >>>> 2. To write a file - instantiate ThingParser with a GenbankEmitter
> >>> wrapping
> >>>> your Genbank object, and a GenbankWriter as the receiver. Use the
> >>> parseAll()
> >>>> method on the ThingParser to dump the whole lot to your chosen output.
> >>>>
> >>>> The clever bit comes when you want to convert between files. Imagine
> >>> you've
> >>>> done all the above for Genbank, and you've also done it for FASTA. How
> to
> >>>> convert between them? What you need to do is this:
> >>>>
> >>>> 1. Implement all the classes for both Genbank and FASTA.
> >>>>
> >>>> 2. Write a GenbankFASTAConverter class that implements
> >>> ThingConverter<FASTA>
> >>>> and GenbankReceiver, and will internally convert the data received and
> >>> pass
> >>>> it on out to the receiver provided, which will be a FASTAReceiver
> >>> instance.
> >>>> 3. Write a FASTAGenbankConverter class that operates in exactly the
> >>> opposite
> >>>> way, implementing ThingConverter<Genbank> and FASTAReceiver.
> >>>>
> >>>> Then to convert you use ThingParser again:
> >>>>
> >>>> 1. From FASTA file to Genbank object: Instantiate ThingParser with a
> >>>> FASTAReader reader, a GenbankBuilder receiver, and add a
> >>>> FASTAGenbankConverter instance to the converter chain. Use the
> iterator
> >>> to
> >>>> get your Genbank objects out of your FASTA file.
> >>>>
> >>>> 2. From FASTA file to Genbank file: Same as option 1, but provide a
> >>>> GenbankWriter instead and use parseAll() instead of the iterator
> methos.
> >>>>
> >>>> 3. From FASTA object to Genbank object: Same as option 1, but provide
> a
> >>>> FASTAEmitter wrapping your FASTA object as the reader instead.
> >>>>
> >>>> 4. From FASTA object to Genbank file: Same as option 1, but swap both
> the
> >>>> reader and the receiver as per options 2 and 3.
> >>>>
> >>>> 5/6/7/8. From Genbank * to FASTA * - same as 1,2,3,4 but swap all
> >>> mentions
> >>>> of FASTA and Genbank, and use GenbankFASTAConverter instead.
> >>>>
> >>>> One last and very important feature of this approach is that if you
> >>> discover
> >>>> that nobody has written the appropriate converter for your chosen pair
> of
> >>>> formats A and C, but converters do exist to map A to some other format
> B
> >>> and
> >>>> that other format B on to C, then you can just put the two converts
> A-B
> >>> and
> >>>> B-C into the ThingParser chain and it'll work perfectly.
> >>>>
> >>>> Enjoy!
> >>>>
> >>>> cheers,
> >>>> Richard
> >>>>
> >>>> --
> >>>> Richard Holland, BSc MBCS
> >>>> Finance Director, Eagle Genomics Ltd
> >>>> M: +44 7500 438846 | E: holland at eaglegenomics.com
> >>>> http://www.eaglegenomics.com/
> >>>> _______________________________________________
> >>>> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
> >>>> http://lists.open-bio.org/mailman/listinfo/biojava-l
> >>>>
> >>
> >>
> >>
> >
>



-- 
Richard Holland, BSc MBCS
Finance Director, Eagle Genomics Ltd
M: +44 7500 438846 | E: holland at eaglegenomics.com
http://www.eaglegenomics.com/



More information about the biojava-dev mailing list