[BioSQL-l] TAXON,TAXON_NAME, was Re: Description

Chris Fields cjfields at uiuc.edu
Thu Sep 13 01:42:03 UTC 2007


If one were using bioperl versions up to 1.5.1 the Bio::Species class  
doesn't implement a specific interface, whereas in 1.5.2 it inherits  
the new Bio::Taxon (and all methods are reimplemented to work with  
Bio::Taxon methods).  Acc. to Sendu the long-term plan was to  
eventually deprecate Bio::Species and just use Bio::Taxon, with no  
'guessing' of the genus/species that always borked seqrcord parsing.   
That 'guessing' is essentially what is going on with SpeciesAaptor  
now (Sendu's suggestion of 'old behavior', which triggered the  
exception in the bug report).

I'll try to look into it in a few weeks when I have some more time;  
there are a number of bioperl-db bugs in bugzilla that need sorting  
through.  My thought is still to use a transition module  
(TaxonAdaptor) which would eventually replace SpeciesAdaptor once  
Bio::Species is no more.

chris

On Sep 12, 2007, at 6:15 PM, Hilmar Lapp wrote:

> The species/taxon handling shouldn't be a problem if you have the  
> NCBI taxonID and have preloaded the NCBI taxonomy.
>
> However, if it's a new species (i.e., the lookup of the NCBI  
> taxonID in the taxon table fails), then bioperl-db tries to create  
> the lineage based on what it finds in the species object.
>
> As the bug report says, the issue can be fixed, but it also looks  
> like the fix will break compatibility with earlier versions of  
> BioPerl. I think at some point that's fine, but I was wondering  
> whether that's the way it needs to be.
>
> 	-hilmar
>
> On Sep 11, 2007, at 12:16 PM, Chris Fields wrote:
>
>> I think one area of possible headache will be TAXON/TAXON_NAME.  For
>> instance, with BioPerl we kept running into genus/species parsing
>> problems (virus, bacterial names) when going from seqrecord->object.
>> Due to that we decided to greatly simplify Species parsing in Bioperl
>> so there isn't any 'guessing' as to genus/species names; you get
>> what's already there, nothing more.  If one wants extra taxonomic
>> information then one must use NCBI Taxonomy somehow.
>>
>> However, currently bioperl-db still splits into genus/species (acts
>> like older BioPerl), which obviously clashes with current Bioperl
>> behavior.  Not sure how the other Bio* store this data; Richard?
>>
>> There is a BioPerl bug filed on this:
>>
>> http://bugzilla.open-bio.org/show_bug.cgi?id=2092
>>
>> chris
>>
>> On Sep 11, 2007, at 10:49 AM, Barry Moore wrote:
>>
>>> Well, the schema is the formal specification as to what goes where
>>> and as long as your BioJava and BioPerl DB interface plays by the
>>> rules of the schema, then yes you should be able to use both
>>> languages on the same database.  Of course the devil is in the
>>> details and since I've only worked with the BioPerl interface I
>>> don't know if that is in fact reality right now.  I think what
>>> Richard meant was there is not detailed human documentation about
>>> where each bit of a GenBank record goes into what table and
>>> column.  Paul, I think you will find this document to be what you
>>> are looking for - or at least as good as you'll get:  go to http://
>>> cvs.open-bio.org/cgi-bin/viewcvs/viewcvs.cgi/biosql-schema/doc/?
>>> cvsroot=biosql and look for schema-overview.txt.  There is also a
>>> ERD in pdf format which can help you get your head around the
>>> schema.  If you end up with specific questions about what's where,
>>> send another e-mail or just load some files and go exploring.
>>>
>>> Barry
>>>
>>> On Sep 11, 2007, at 9:10 AM, Chris Fields wrote:
>>>
>>>> Here's a question I couldn't find the answer to: should any BioSQL-
>>>> loaded data (via BioJava, BioPerl, etc) be expected to fully round
>>>> trip across any BioSQL-utilizing language?  In other words, if I  
>>>> use
>>>> BioJava/Hibernate to load sequence data in to a BioSQL database and
>>>> use BioPerl to work with the data, can one expect it to work?
>>>>
>>>> My guess is no, as long as there is no formal specification...
>>>>
>>>> chris
>>>>
>>>> On Sep 11, 2007, at 9:54 AM, Richard Holland wrote:
>>>>
>>>>> -----BEGIN PGP SIGNED MESSAGE-----
>>>>> Hash: SHA1
>>>>>
>>>>> There is no formal specification for what goes where in BioSQL,  
>>>>> but
>>>>> you
>>>>> can refer to the BioJava documentation for a good approximation of
>>>>> where
>>>>> a GenBank file should end up. The BioJava objects share similar
>>>>> names to
>>>>> the BioSQL tables and are mapped using Hibernate.
>>>>>
>>>>> The most useful parts of the docs are probably:
>>>>>
>>>>> http://biojava.org/wiki/BioJava:BioJavaXDocs#GenBank
>>>>>
>>>>> and:
>>>>>
>>>>> http://biojava.org/wiki/BioJava:BioJavaXDocs#Hibernate_object-
>>>>> relational_mappings.
>>>>>
>>>>> cheers,
>>>>> Richard
>>>>>
>>>>> Paul Davis wrote:
>>>>>> I've been going over the biosql schema and I was wondering if  
>>>>>> there
>>>>>> was a good place to read about examples of actual data that goes
>>>>>> into
>>>>>> each table. Specifically, I'm a bit confused about which parts  
>>>>>> of a
>>>>>> genbank record go in which tables.
>>>>>>
>>>>>> Thanks,
>>>>>> Paul Davis
>>>>>> _______________________________________________
>>>>>> BioSQL-l mailing list
>>>>>> BioSQL-l at lists.open-bio.org
>>>>>> http://lists.open-bio.org/mailman/listinfo/biosql-l
>>>>>>
>>>>> -----BEGIN PGP SIGNATURE-----
>>>>> Version: GnuPG v1.4.2.2 (GNU/Linux)
>>>>> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
>>>>>
>>>>> iD8DBQFG5qw64C5LeMEKA/QRAiAPAJ41b3+cO7LQc1F4nAFrUWsVLwbl8wCgjFvd
>>>>> Q8i8g2bUyB17L++fuSKXa+0=
>>>>> =q8G2
>>>>> -----END PGP SIGNATURE-----
>>>>> _______________________________________________
>>>>> BioSQL-l mailing list
>>>>> BioSQL-l at lists.open-bio.org
>>>>> http://lists.open-bio.org/mailman/listinfo/biosql-l
>>>>
>>>> Christopher Fields
>>>> Postdoctoral Researcher
>>>> Lab of Dr. Robert Switzer
>>>> Dept of Biochemistry
>>>> University of Illinois Urbana-Champaign
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> BioSQL-l mailing list
>>>> BioSQL-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/biosql-l
>>>
>>
>> Christopher Fields
>> Postdoctoral Researcher
>> Lab of Dr. Robert Switzer
>> Dept of Biochemistry
>> University of Illinois Urbana-Champaign
>>
>>
>>
>> _______________________________________________
>> BioSQL-l mailing list
>> BioSQL-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/biosql-l
>
> -- 
> ===========================================================
> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
> ===========================================================
>
>
>
>
>

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign






More information about the BioSQL-l mailing list