[Biojava-l] File parsing in BJ3

Andy Yates ayates at ebi.ac.uk
Tue Oct 21 08:49:47 UTC 2008


Depends on what you want to program. If you want to have a collection of
objects which are Things & perform a common action on them then
annotations are not the way forward.

If you want to have some kind of meta-programming occurring & need a
class to be multiple things then annotations are right. There is
currently no way to enforce compile time dependencies on annotations &
my thinking is that this is right. Annotations should be meta data or
provide a way to alter a class in a non-invasive way (think Web Service
annotations creating WS Servers & Clients without any alteration of the
class).

Andy

Richard Holland wrote:
> Spot on.
> 
> Annotation/interface.... i think Annotation is probably better as you
> suggest, but I'd have to look into that. Not sure how it works with
> collections and generics. If it does turn out to be a better bet, I'll
> change it over.
> 
> With the BioSQL dependencies, take a look at the pom.xml file inside the
> biojava-dna module. It declares a dependency on biojava-core. If you want to
> add dependencies to external JARs, take a look at biojava-biosql's pom.xml
> to see how it depends on javax.persistence. (The easiest way to add these is
> via an IDE such as NetBeans, which is what I'm using at the moment).
> 
> cheers,
> Richard
> 
> 2008/10/21 Mark Schreiber <markjschreiber at gmail.com>
> 
>> So if I want to build a BioSQL loader from Genbank then would the
>> classes (or there wrappers) in the BioSQL Entity package need to
>> implement Thing?  Would maven have an issue with that or would it just
>> create a dependency on core? (you can tell I've never used Maven
>> right).
>>
>> From a design point of view should Thing be an interface or an
>> Annotation? The reason I ask is that it doesn't define any methods so
>> it is more of a tag than an interface.
>>
>> Anyway, my understanding is that I would use a Genbank parser (or
>> write one). Write a EntityReceiver interface (probably more than one
>> given the number of entities in BioSQL, implement a EntityBuilder
>> (again possibly more than one) that implements EntityReceiver and
>> builds Entity beans from messages it receives. In this case I probably
>> wouldn't provide a writer as JPA would be writing the beans to the
>> database.  Would this be how you imagine it?
>>
>> - Mark
>>
>>
>> On Tue, Oct 21, 2008 at 1:52 AM, Richard Holland
>> <holland at eaglegenomics.com> wrote:
>>> (From now on I will only be posting these development messages to
>>> biojava-dev, which is the intended purpose of that list. Those of you who
>>> wish to keep track of things but are currently only subscribed to
>> biojava-l
>>> should also subscribe to biojava-dev in order to keep up to date.)
>>>
>>> As promised, I've committed a new package in the biojava-core module that
>>> should help understand how to do file parsing and conversion and writing
>> in
>>> the new BJ3 modules. Here's an example of how to use it to write a
>> Genbank
>>> parser (note no parsers actually exist yet!):
>>>
>>> 1. Design yourself a Genbank class which implements the interface Thing
>> and
>>> can fully represent all the data that might possibly occur inside a
>> Genbank
>>> file.
>>>
>>> 2. Write an interface called GenbankReceiver, which extends ThingReceiver
>>> and defines all the methods you might need in order to construct a
>> Genbank
>>> object in an asynchronous fashion.
>>>
>>> 3. Write a GenbankBuilder class which implements GenbankReceiver and
>>> ThingBuilder. It's job is to receive data via method calls, use that data
>> to
>>> construct a Genbank object, then provide that object on demand.
>>>
>>> 4. Write a GenbankWriter class which implements GenbankReceiver and
>>> ThingWriter. It's job is similar to GenbankBuilder, but instead of
>>> constructing new Genbank objects, it writes Genbank records to file that
>>> reflect the data it receives.
>>>
>>> 5. Write a GenbankReader class which implements ThingReader. It can read
>>> GenbankFiles and output the data to the methods of the ThingReceiver
>>> provided to it, which in this case could be anything which implements the
>>> interface GenbankReceiver.
>>>
>>> 6. Write a GenbankEmitter class which implements ThingEmitter. It takes a
>>> Genbank object and will fire off data from it to the provided
>> ThingReceiver
>>> (a GenbankReceiver instance) as if the Genbank object was being read from
>> a
>>> file or some other source.
>>>
>>> That's it! OK so it's a minimum of 6 classes instead of the original 1 or
>> 2,
>>> but the additional steps are necessary for flexibility in converting
>> between
>>> formats.
>>>
>>> Now to use it (you'll probably want a GenbankTools class to wrap these
>> steps
>>> up for user-friendliness, including various options for opening files,
>>> etc.):
>>>
>>> 1. To read a file - instantiate ThingParser with your GenbankReader as
>> the
>>> reader, and GenbankBuilder as the receiver. Use the iterator methods on
>>> ThingParser to get the objects out.
>>>
>>> 2. To write a file - instantiate ThingParser with a GenbankEmitter
>> wrapping
>>> your Genbank object, and a GenbankWriter as the receiver. Use the
>> parseAll()
>>> method on the ThingParser to dump the whole lot to your chosen output.
>>>
>>> The clever bit comes when you want to convert between files. Imagine
>> you've
>>> done all the above for Genbank, and you've also done it for FASTA. How to
>>> convert between them? What you need to do is this:
>>>
>>> 1. Implement all the classes for both Genbank and FASTA.
>>>
>>> 2. Write a GenbankFASTAConverter class that implements
>> ThingConverter<FASTA>
>>> and GenbankReceiver, and will internally convert the data received and
>> pass
>>> it on out to the receiver provided, which will be a FASTAReceiver
>> instance.
>>> 3. Write a FASTAGenbankConverter class that operates in exactly the
>> opposite
>>> way, implementing ThingConverter<Genbank> and FASTAReceiver.
>>>
>>> Then to convert you use ThingParser again:
>>>
>>> 1. From FASTA file to Genbank object: Instantiate ThingParser with a
>>> FASTAReader reader, a GenbankBuilder receiver, and add a
>>> FASTAGenbankConverter instance to the converter chain. Use the iterator
>> to
>>> get your Genbank objects out of your FASTA file.
>>>
>>> 2. From FASTA file to Genbank file: Same as option 1, but provide a
>>> GenbankWriter instead and use parseAll() instead of the iterator methos.
>>>
>>> 3. From FASTA object to Genbank object: Same as option 1, but provide a
>>> FASTAEmitter wrapping your FASTA object as the reader instead.
>>>
>>> 4. From FASTA object to Genbank file: Same as option 1, but swap both the
>>> reader and the receiver as per options 2 and 3.
>>>
>>> 5/6/7/8. From Genbank * to FASTA * - same as 1,2,3,4 but swap all
>> mentions
>>> of FASTA and Genbank, and use GenbankFASTAConverter instead.
>>>
>>> One last and very important feature of this approach is that if you
>> discover
>>> that nobody has written the appropriate converter for your chosen pair of
>>> formats A and C, but converters do exist to map A to some other format B
>> and
>>> that other format B on to C, then you can just put the two converts A-B
>> and
>>> B-C into the ThingParser chain and it'll work perfectly.
>>>
>>> Enjoy!
>>>
>>> cheers,
>>> Richard
>>>
>>> --
>>> Richard Holland, BSc MBCS
>>> Finance Director, Eagle Genomics Ltd
>>> M: +44 7500 438846 | E: holland at eaglegenomics.com
>>> http://www.eaglegenomics.com/
>>> _______________________________________________
>>> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/biojava-l
>>>
> 
> 
> 



More information about the Biojava-l mailing list