[Biojava-dev] [Biojava-l] File parsing in BJ3

Andy Yates ayates at ebi.ac.uk
Tue Oct 21 14:32:45 UTC 2008


If "Thing" has gone then what impact does this have on remaining
classes? Considering methods like canReadNextThing() & readNextThing();
should this be canReadNext() & readNext()?

Just an idle thought ....

Andy

Richard Holland wrote:
> The two examples I gave would be better as annotations, its true.
> Serializable, and Cloneable for that matter, would definitely work better
> that way.
> 
> Well, we could do away with Thing altogether then. I'll update the code.
> 
> 
> 2008/10/21 Mark Schreiber <markjschreiber at gmail.com>
> 
>> Depending on what you want them for isMachineGenerated(),
>> isManuallyCurated(), would possibly be better as annotations
>> (@MachineGenerated, @ManuallyCurated). This is true metadata.
>>
>> Probably if Java had annotations in version 1.1 Serializable would
>> also be an Annotation.  I would agree with the idea that ThingBuilder
>> etc should be typed on extends Serializable.
>>
>> - Mark
>>
>> On Tue, Oct 21, 2008 at 7:14 PM, Richard Holland
>> <dicknetherlands at gmail.com> wrote:
>>> For now, yes it's empty. But I can envisage situations where it might be
>>> nice to have Thing implement some common methods (e.g.
>> isMachineGenerated(),
>>> isManuallyCurated(), etc.). I'd rather have it there now to be a
>> placeholder
>>> for future expansion, than have to re-engineer everything should we
>> identify
>>> a need for common functions in future.
>>>
>>> You'll see that Thing already extends Serializable, implying that all
>> Things
>>> must be able to persist to an object backing store. Serializable itself
>> is
>>> also an empty interface!
>>>
>>> Also I like the idea of having Thing, not Object, as a kind of marker of
>>> intention. To me it makes it clearer when reading code to avoid Object
>>> wherever possible. Thing may not be any more clever than Object, but it
>>> immediately declares an intention when reading code as to what kind of
>>> Object should be expected.
>>>
>>>
>>> 2008/10/21 Mark Schreiber <markjschreiber at gmail.com>
>>>> Is there any need for Thing at all? Can't a bulder be typed to produce
>>>> something that extends Object?
>>>>
>>>> If Thing provides no behaivour contract or meta-information then why
>>>> does it exist?
>>>>
>>>> - Mark
>>>>
>>>> On Tue, Oct 21, 2008 at 4:49 PM, Andy Yates <ayates at ebi.ac.uk> wrote:
>>>>> Depends on what you want to program. If you want to have a collection
>> of
>>>>> objects which are Things & perform a common action on them then
>>>>> annotations are not the way forward.
>>>>>
>>>>> If you want to have some kind of meta-programming occurring & need a
>>>>> class to be multiple things then annotations are right. There is
>>>>> currently no way to enforce compile time dependencies on annotations &
>>>>> my thinking is that this is right. Annotations should be meta data or
>>>>> provide a way to alter a class in a non-invasive way (think Web
>> Service
>>>>> annotations creating WS Servers & Clients without any alteration of
>> the
>>>>> class).
>>>>>
>>>>> Andy
>>>>>
>>>>> Richard Holland wrote:
>>>>>> Spot on.
>>>>>>
>>>>>> Annotation/interface.... i think Annotation is probably better as you
>>>>>> suggest, but I'd have to look into that. Not sure how it works with
>>>>>> collections and generics. If it does turn out to be a better bet,
>> I'll
>>>>>> change it over.
>>>>>>
>>>>>> With the BioSQL dependencies, take a look at the pom.xml file inside
>>>>>> the
>>>>>> biojava-dna module. It declares a dependency on biojava-core. If you
>>>>>> want to
>>>>>> add dependencies to external JARs, take a look at biojava-biosql's
>>>>>> pom.xml
>>>>>> to see how it depends on javax.persistence. (The easiest way to add
>>>>>> these is
>>>>>> via an IDE such as NetBeans, which is what I'm using at the moment).
>>>>>>
>>>>>> cheers,
>>>>>> Richard
>>>>>>
>>>>>> 2008/10/21 Mark Schreiber <markjschreiber at gmail.com>
>>>>>>
>>>>>>> So if I want to build a BioSQL loader from Genbank then would the
>>>>>>> classes (or there wrappers) in the BioSQL Entity package need to
>>>>>>> implement Thing?  Would maven have an issue with that or would it
>> just
>>>>>>> create a dependency on core? (you can tell I've never used Maven
>>>>>>> right).
>>>>>>>
>>>>>>> From a design point of view should Thing be an interface or an
>>>>>>> Annotation? The reason I ask is that it doesn't define any methods
>> so
>>>>>>> it is more of a tag than an interface.
>>>>>>>
>>>>>>> Anyway, my understanding is that I would use a Genbank parser (or
>>>>>>> write one). Write a EntityReceiver interface (probably more than one
>>>>>>> given the number of entities in BioSQL, implement a EntityBuilder
>>>>>>> (again possibly more than one) that implements EntityReceiver and
>>>>>>> builds Entity beans from messages it receives. In this case I
>> probably
>>>>>>> wouldn't provide a writer as JPA would be writing the beans to the
>>>>>>> database.  Would this be how you imagine it?
>>>>>>>
>>>>>>> - Mark
>>>>>>>
>>>>>>>
>>>>>>> On Tue, Oct 21, 2008 at 1:52 AM, Richard Holland
>>>>>>> <holland at eaglegenomics.com> wrote:
>>>>>>>> (From now on I will only be posting these development messages to
>>>>>>>> biojava-dev, which is the intended purpose of that list. Those of
>> you
>>>>>>>> who
>>>>>>>> wish to keep track of things but are currently only subscribed to
>>>>>>> biojava-l
>>>>>>>> should also subscribe to biojava-dev in order to keep up to date.)
>>>>>>>>
>>>>>>>> As promised, I've committed a new package in the biojava-core
>> module
>>>>>>>> that
>>>>>>>> should help understand how to do file parsing and conversion and
>>>>>>>> writing
>>>>>>> in
>>>>>>>> the new BJ3 modules. Here's an example of how to use it to write a
>>>>>>> Genbank
>>>>>>>> parser (note no parsers actually exist yet!):
>>>>>>>>
>>>>>>>> 1. Design yourself a Genbank class which implements the interface
>>>>>>>> Thing
>>>>>>> and
>>>>>>>> can fully represent all the data that might possibly occur inside a
>>>>>>> Genbank
>>>>>>>> file.
>>>>>>>>
>>>>>>>> 2. Write an interface called GenbankReceiver, which extends
>>>>>>>> ThingReceiver
>>>>>>>> and defines all the methods you might need in order to construct a
>>>>>>> Genbank
>>>>>>>> object in an asynchronous fashion.
>>>>>>>>
>>>>>>>> 3. Write a GenbankBuilder class which implements GenbankReceiver
>> and
>>>>>>>> ThingBuilder. It's job is to receive data via method calls, use
>> that
>>>>>>>> data
>>>>>>> to
>>>>>>>> construct a Genbank object, then provide that object on demand.
>>>>>>>>
>>>>>>>> 4. Write a GenbankWriter class which implements GenbankReceiver and
>>>>>>>> ThingWriter. It's job is similar to GenbankBuilder, but instead of
>>>>>>>> constructing new Genbank objects, it writes Genbank records to file
>>>>>>>> that
>>>>>>>> reflect the data it receives.
>>>>>>>>
>>>>>>>> 5. Write a GenbankReader class which implements ThingReader. It can
>>>>>>>> read
>>>>>>>> GenbankFiles and output the data to the methods of the
>> ThingReceiver
>>>>>>>> provided to it, which in this case could be anything which
>> implements
>>>>>>>> the
>>>>>>>> interface GenbankReceiver.
>>>>>>>>
>>>>>>>> 6. Write a GenbankEmitter class which implements ThingEmitter. It
>>>>>>>> takes a
>>>>>>>> Genbank object and will fire off data from it to the provided
>>>>>>> ThingReceiver
>>>>>>>> (a GenbankReceiver instance) as if the Genbank object was being
>> read
>>>>>>>> from
>>>>>>> a
>>>>>>>> file or some other source.
>>>>>>>>
>>>>>>>> That's it! OK so it's a minimum of 6 classes instead of the
>> original
>>>>>>>> 1 or
>>>>>>> 2,
>>>>>>>> but the additional steps are necessary for flexibility in
>> converting
>>>>>>> between
>>>>>>>> formats.
>>>>>>>>
>>>>>>>> Now to use it (you'll probably want a GenbankTools class to wrap
>>>>>>>> these
>>>>>>> steps
>>>>>>>> up for user-friendliness, including various options for opening
>>>>>>>> files,
>>>>>>>> etc.):
>>>>>>>>
>>>>>>>> 1. To read a file - instantiate ThingParser with your GenbankReader
>>>>>>>> as
>>>>>>> the
>>>>>>>> reader, and GenbankBuilder as the receiver. Use the iterator
>> methods
>>>>>>>> on
>>>>>>>> ThingParser to get the objects out.
>>>>>>>>
>>>>>>>> 2. To write a file - instantiate ThingParser with a GenbankEmitter
>>>>>>> wrapping
>>>>>>>> your Genbank object, and a GenbankWriter as the receiver. Use the
>>>>>>> parseAll()
>>>>>>>> method on the ThingParser to dump the whole lot to your chosen
>>>>>>>> output.
>>>>>>>>
>>>>>>>> The clever bit comes when you want to convert between files.
>> Imagine
>>>>>>> you've
>>>>>>>> done all the above for Genbank, and you've also done it for FASTA.
>>>>>>>> How to
>>>>>>>> convert between them? What you need to do is this:
>>>>>>>>
>>>>>>>> 1. Implement all the classes for both Genbank and FASTA.
>>>>>>>>
>>>>>>>> 2. Write a GenbankFASTAConverter class that implements
>>>>>>> ThingConverter<FASTA>
>>>>>>>> and GenbankReceiver, and will internally convert the data received
>>>>>>>> and
>>>>>>> pass
>>>>>>>> it on out to the receiver provided, which will be a FASTAReceiver
>>>>>>> instance.
>>>>>>>> 3. Write a FASTAGenbankConverter class that operates in exactly the
>>>>>>> opposite
>>>>>>>> way, implementing ThingConverter<Genbank> and FASTAReceiver.
>>>>>>>>
>>>>>>>> Then to convert you use ThingParser again:
>>>>>>>>
>>>>>>>> 1. From FASTA file to Genbank object: Instantiate ThingParser with
>> a
>>>>>>>> FASTAReader reader, a GenbankBuilder receiver, and add a
>>>>>>>> FASTAGenbankConverter instance to the converter chain. Use the
>>>>>>>> iterator
>>>>>>> to
>>>>>>>> get your Genbank objects out of your FASTA file.
>>>>>>>>
>>>>>>>> 2. From FASTA file to Genbank file: Same as option 1, but provide a
>>>>>>>> GenbankWriter instead and use parseAll() instead of the iterator
>>>>>>>> methos.
>>>>>>>>
>>>>>>>> 3. From FASTA object to Genbank object: Same as option 1, but
>> provide
>>>>>>>> a
>>>>>>>> FASTAEmitter wrapping your FASTA object as the reader instead.
>>>>>>>>
>>>>>>>> 4. From FASTA object to Genbank file: Same as option 1, but swap
>> both
>>>>>>>> the
>>>>>>>> reader and the receiver as per options 2 and 3.
>>>>>>>>
>>>>>>>> 5/6/7/8. From Genbank * to FASTA * - same as 1,2,3,4 but swap all
>>>>>>> mentions
>>>>>>>> of FASTA and Genbank, and use GenbankFASTAConverter instead.
>>>>>>>>
>>>>>>>> One last and very important feature of this approach is that if you
>>>>>>> discover
>>>>>>>> that nobody has written the appropriate converter for your chosen
>>>>>>>> pair of
>>>>>>>> formats A and C, but converters do exist to map A to some other
>>>>>>>> format B
>>>>>>> and
>>>>>>>> that other format B on to C, then you can just put the two converts
>>>>>>>> A-B
>>>>>>> and
>>>>>>>> B-C into the ThingParser chain and it'll work perfectly.
>>>>>>>>
>>>>>>>> Enjoy!
>>>>>>>>
>>>>>>>> cheers,
>>>>>>>> Richard
>>>>>>>>
>>>>>>>> --
>>>>>>>> Richard Holland, BSc MBCS
>>>>>>>> Finance Director, Eagle Genomics Ltd
>>>>>>>> M: +44 7500 438846 | E: holland at eaglegenomics.com
>>>>>>>> http://www.eaglegenomics.com/
>>>>>>>> _______________________________________________
>>>>>>>> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
>>>>>>>> http://lists.open-bio.org/mailman/listinfo/biojava-l
>>>>>>>>
>>>>>>
>>>>>>
>>>
>>>
>>> --
>>> Richard Holland, BSc MBCS
>>> Finance Director, Eagle Genomics Ltd
>>> M: +44 7500 438846 | E: holland at eaglegenomics.com
>>> http://www.eaglegenomics.com/
>>>
> 
> 
> 



More information about the biojava-dev mailing list