[Biojava-dev] BioJava3 Use Cases

Thu Jul 31 15:47:37 UTC 2008

In many ways that is what exists already, save an actual ontology, and
I was definitely hoping to build on the existing methods in BJ3. The
current system parses files by translating them into a common object
model, which itself is based on the BioSQL object model, and then
persists to BioSQL pretty much on a 1:1 basis. It would not be
difficult to extend this to translate any format to this common layer,
then transfer that common layer onto a database.

Ontologies are a good way of doing it. However, you soon run into the
issue of what makes a good intermediate object layer (i.e. an object
model which can be described using the ontology, and suits every
possible data format without losing any information from any of them).
This is where it gets hard and you have to start compromising, at
which point there's always someone out there that goes 'but why
doesn't it parse my favourite tag?'.

Mark Schreiber has suggested INSDseq as a good intermediate format for
sequence data, but that of course doesn't cater for microarrays etc.

So, I was heading towards making each parser use its own
format-specific object model, within which a user can explore exactly
as if it were the file itself. Each database format supported would
also have its own object model matching the schema. Then, I wanted to
provide a set of translators that can map between them. How those
translators work is uncertain, and would probably be on a specific
format-to-format basis, one for each combo, but your ontology method
is a possible alternative solution. There's also a tool out there
called Dozer which can translate any bean into any other bean based on
a mapping definition. I was hoping we could make use of that.

So... nice idea, and I can see where you're coming from, but it
suffers the same problem as our existing model does - how do you
define a common ontology/object model that can cater for everything.
Answers on a postcard please...!

cheers,
Richard

2008/7/31 Mark Fortner <phidias51 at gmail.com>:
> Hi Richard,
> I started to writeup a use-case, and realized that what I was thinking about
> was more than just a single use-case, and might be beyond the scope of what
> biojava should handle.  So I thought I would ask before writing it up.
>
> It seems that we have a need for a generic parsing and dataloading
> framework.  Something that encompasses not only Mark Schreiber's use-case,
> but could also handle data from a variety of different instrument files --
> everything from MAGE files, ABI trace files, CSV, Excel, or text flat files.
>
> I've been thinking about creating a semantic data loader.  This would be
> part framework and part GUI application.  It would allow the user to map
> elements of data files to an ontology, and also map parts of an ontology to
> different databases.  The user would be able to create a parser definition,
> which would parse and load data, verify the data against rules stored in the
> ontology and then load the data into the database using the data's ontology
> tags as a guide.
>
> A first step in the implementation, might be to retrofit the BioSQL and
> parsing code.
>
> Is this within the scope of BioJava?  Is it something that the community
> would find interesting and useful in the everyday work they do?  It it
> something to which people would like to contribute?
>
> Regards,
>
> --
> Mark Fortner
>
> blog: http://feeds.feedburner.com/jroller/ideafactory
> _______________________________________________
> biojava-dev mailing list
> biojava-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>

-- 
Richard Holland
Bioinformatics Software Developer
Eagle Genomics
http://www.eaglegenomics.com/