[Biojava-l] File parsing in BJ3

Mark Schreiber markjschreiber at gmail.com
Tue Oct 21 10:35:14 UTC 2008


Is there any need for Thing at all? Can't a bulder be typed to produce
something that extends Object?

If Thing provides no behaivour contract or meta-information then why
does it exist?

- Mark

On Tue, Oct 21, 2008 at 4:49 PM, Andy Yates <ayates at ebi.ac.uk> wrote:
> Depends on what you want to program. If you want to have a collection of
> objects which are Things & perform a common action on them then
> annotations are not the way forward.
>
> If you want to have some kind of meta-programming occurring & need a
> class to be multiple things then annotations are right. There is
> currently no way to enforce compile time dependencies on annotations &
> my thinking is that this is right. Annotations should be meta data or
> provide a way to alter a class in a non-invasive way (think Web Service
> annotations creating WS Servers & Clients without any alteration of the
> class).
>
> Andy
>
> Richard Holland wrote:
>> Spot on.
>>
>> Annotation/interface.... i think Annotation is probably better as you
>> suggest, but I'd have to look into that. Not sure how it works with
>> collections and generics. If it does turn out to be a better bet, I'll
>> change it over.
>>
>> With the BioSQL dependencies, take a look at the pom.xml file inside the
>> biojava-dna module. It declares a dependency on biojava-core. If you want to
>> add dependencies to external JARs, take a look at biojava-biosql's pom.xml
>> to see how it depends on javax.persistence. (The easiest way to add these is
>> via an IDE such as NetBeans, which is what I'm using at the moment).
>>
>> cheers,
>> Richard
>>
>> 2008/10/21 Mark Schreiber <markjschreiber at gmail.com>
>>
>>> So if I want to build a BioSQL loader from Genbank then would the
>>> classes (or there wrappers) in the BioSQL Entity package need to
>>> implement Thing?  Would maven have an issue with that or would it just
>>> create a dependency on core? (you can tell I've never used Maven
>>> right).
>>>
>>> From a design point of view should Thing be an interface or an
>>> Annotation? The reason I ask is that it doesn't define any methods so
>>> it is more of a tag than an interface.
>>>
>>> Anyway, my understanding is that I would use a Genbank parser (or
>>> write one). Write a EntityReceiver interface (probably more than one
>>> given the number of entities in BioSQL, implement a EntityBuilder
>>> (again possibly more than one) that implements EntityReceiver and
>>> builds Entity beans from messages it receives. In this case I probably
>>> wouldn't provide a writer as JPA would be writing the beans to the
>>> database.  Would this be how you imagine it?
>>>
>>> - Mark
>>>
>>>
>>> On Tue, Oct 21, 2008 at 1:52 AM, Richard Holland
>>> <holland at eaglegenomics.com> wrote:
>>>> (From now on I will only be posting these development messages to
>>>> biojava-dev, which is the intended purpose of that list. Those of you who
>>>> wish to keep track of things but are currently only subscribed to
>>> biojava-l
>>>> should also subscribe to biojava-dev in order to keep up to date.)
>>>>
>>>> As promised, I've committed a new package in the biojava-core module that
>>>> should help understand how to do file parsing and conversion and writing
>>> in
>>>> the new BJ3 modules. Here's an example of how to use it to write a
>>> Genbank
>>>> parser (note no parsers actually exist yet!):
>>>>
>>>> 1. Design yourself a Genbank class which implements the interface Thing
>>> and
>>>> can fully represent all the data that might possibly occur inside a
>>> Genbank
>>>> file.
>>>>
>>>> 2. Write an interface called GenbankReceiver, which extends ThingReceiver
>>>> and defines all the methods you might need in order to construct a
>>> Genbank
>>>> object in an asynchronous fashion.
>>>>
>>>> 3. Write a GenbankBuilder class which implements GenbankReceiver and
>>>> ThingBuilder. It's job is to receive data via method calls, use that data
>>> to
>>>> construct a Genbank object, then provide that object on demand.
>>>>
>>>> 4. Write a GenbankWriter class which implements GenbankReceiver and
>>>> ThingWriter. It's job is similar to GenbankBuilder, but instead of
>>>> constructing new Genbank objects, it writes Genbank records to file that
>>>> reflect the data it receives.
>>>>
>>>> 5. Write a GenbankReader class which implements ThingReader. It can read
>>>> GenbankFiles and output the data to the methods of the ThingReceiver
>>>> provided to it, which in this case could be anything which implements the
>>>> interface GenbankReceiver.
>>>>
>>>> 6. Write a GenbankEmitter class which implements ThingEmitter. It takes a
>>>> Genbank object and will fire off data from it to the provided
>>> ThingReceiver
>>>> (a GenbankReceiver instance) as if the Genbank object was being read from
>>> a
>>>> file or some other source.
>>>>
>>>> That's it! OK so it's a minimum of 6 classes instead of the original 1 or
>>> 2,
>>>> but the additional steps are necessary for flexibility in converting
>>> between
>>>> formats.
>>>>
>>>> Now to use it (you'll probably want a GenbankTools class to wrap these
>>> steps
>>>> up for user-friendliness, including various options for opening files,
>>>> etc.):
>>>>
>>>> 1. To read a file - instantiate ThingParser with your GenbankReader as
>>> the
>>>> reader, and GenbankBuilder as the receiver. Use the iterator methods on
>>>> ThingParser to get the objects out.
>>>>
>>>> 2. To write a file - instantiate ThingParser with a GenbankEmitter
>>> wrapping
>>>> your Genbank object, and a GenbankWriter as the receiver. Use the
>>> parseAll()
>>>> method on the ThingParser to dump the whole lot to your chosen output.
>>>>
>>>> The clever bit comes when you want to convert between files. Imagine
>>> you've
>>>> done all the above for Genbank, and you've also done it for FASTA. How to
>>>> convert between them? What you need to do is this:
>>>>
>>>> 1. Implement all the classes for both Genbank and FASTA.
>>>>
>>>> 2. Write a GenbankFASTAConverter class that implements
>>> ThingConverter<FASTA>
>>>> and GenbankReceiver, and will internally convert the data received and
>>> pass
>>>> it on out to the receiver provided, which will be a FASTAReceiver
>>> instance.
>>>> 3. Write a FASTAGenbankConverter class that operates in exactly the
>>> opposite
>>>> way, implementing ThingConverter<Genbank> and FASTAReceiver.
>>>>
>>>> Then to convert you use ThingParser again:
>>>>
>>>> 1. From FASTA file to Genbank object: Instantiate ThingParser with a
>>>> FASTAReader reader, a GenbankBuilder receiver, and add a
>>>> FASTAGenbankConverter instance to the converter chain. Use the iterator
>>> to
>>>> get your Genbank objects out of your FASTA file.
>>>>
>>>> 2. From FASTA file to Genbank file: Same as option 1, but provide a
>>>> GenbankWriter instead and use parseAll() instead of the iterator methos.
>>>>
>>>> 3. From FASTA object to Genbank object: Same as option 1, but provide a
>>>> FASTAEmitter wrapping your FASTA object as the reader instead.
>>>>
>>>> 4. From FASTA object to Genbank file: Same as option 1, but swap both the
>>>> reader and the receiver as per options 2 and 3.
>>>>
>>>> 5/6/7/8. From Genbank * to FASTA * - same as 1,2,3,4 but swap all
>>> mentions
>>>> of FASTA and Genbank, and use GenbankFASTAConverter instead.
>>>>
>>>> One last and very important feature of this approach is that if you
>>> discover
>>>> that nobody has written the appropriate converter for your chosen pair of
>>>> formats A and C, but converters do exist to map A to some other format B
>>> and
>>>> that other format B on to C, then you can just put the two converts A-B
>>> and
>>>> B-C into the ThingParser chain and it'll work perfectly.
>>>>
>>>> Enjoy!
>>>>
>>>> cheers,
>>>> Richard
>>>>
>>>> --
>>>> Richard Holland, BSc MBCS
>>>> Finance Director, Eagle Genomics Ltd
>>>> M: +44 7500 438846 | E: holland at eaglegenomics.com
>>>> http://www.eaglegenomics.com/
>>>> _______________________________________________
>>>> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/biojava-l
>>>>
>>
>>
>>
>



More information about the Biojava-l mailing list