[BioSQL-l] TAXON,TAXON_NAME, was Re: Description

Thu Sep 13 00:19:24 UTC 2007

The code is in bioperl-db (which is a sub-repository of bioperl, as  
is bioperl-live).

It makes no attempt at updating the nested-set values. That raises a  
good point - there is currently no script that would update that; the  
load_ncbi_taxonomy.pl script does recompute it, but will also want to  
load or update the NCBI taxonomy. It should be relatively easy to  
factor out the nested-set computing code into a separate stand-alone  
script.

	-hilmar

On Sep 12, 2007, at 8:13 PM, Paul Davis wrote:

> I glanced through the bioperl cvs a bit but couldn't find the part
> where it tries to load a new taxonomy name. Does this go and try to
> rebuild the nested sets information, or basically leave any inserted
> taxonomic data (non-NCBI data) as nodes dangling outside the nested
> sets information?
>
> Paul
>
> On 9/12/07, Hilmar Lapp <hlapp at gmx.net> wrote:
>> The species/taxon handling shouldn't be a problem if you have the
>> NCBI taxonID and have preloaded the NCBI taxonomy.
>>
>> However, if it's a new species (i.e., the lookup of the NCBI taxonID
>> in the taxon table fails), then bioperl-db tries to create the
>> lineage based on what it finds in the species object.
>>
>> As the bug report says, the issue can be fixed, but it also looks
>> like the fix will break compatibility with earlier versions of
>> BioPerl. I think at some point that's fine, but I was wondering
>> whether that's the way it needs to be.
>>
>>         -hilmar
>>
>> On Sep 11, 2007, at 12:16 PM, Chris Fields wrote:
>>
>>> I think one area of possible headache will be TAXON/TAXON_NAME.  For
>>> instance, with BioPerl we kept running into genus/species parsing
>>> problems (virus, bacterial names) when going from seqrecord->object.
>>> Due to that we decided to greatly simplify Species parsing in  
>>> Bioperl
>>> so there isn't any 'guessing' as to genus/species names; you get
>>> what's already there, nothing more.  If one wants extra taxonomic
>>> information then one must use NCBI Taxonomy somehow.
>>>
>>> However, currently bioperl-db still splits into genus/species (acts
>>> like older BioPerl), which obviously clashes with current Bioperl
>>> behavior.  Not sure how the other Bio* store this data; Richard?
>>>
>>> There is a BioPerl bug filed on this:
>>>
>>> http://bugzilla.open-bio.org/show_bug.cgi?id=2092
>>>
>>> chris
>>>
>>> On Sep 11, 2007, at 10:49 AM, Barry Moore wrote:
>>>
>>>> Well, the schema is the formal specification as to what goes where
>>>> and as long as your BioJava and BioPerl DB interface plays by the
>>>> rules of the schema, then yes you should be able to use both
>>>> languages on the same database.  Of course the devil is in the
>>>> details and since I've only worked with the BioPerl interface I
>>>> don't know if that is in fact reality right now.  I think what
>>>> Richard meant was there is not detailed human documentation about
>>>> where each bit of a GenBank record goes into what table and
>>>> column.  Paul, I think you will find this document to be what you
>>>> are looking for - or at least as good as you'll get:  go to http://
>>>> cvs.open-bio.org/cgi-bin/viewcvs/viewcvs.cgi/biosql-schema/doc/?
>>>> cvsroot=biosql and look for schema-overview.txt.  There is also a
>>>> ERD in pdf format which can help you get your head around the
>>>> schema.  If you end up with specific questions about what's where,
>>>> send another e-mail or just load some files and go exploring.
>>>>
>>>> Barry
>>>>
>>>> On Sep 11, 2007, at 9:10 AM, Chris Fields wrote:
>>>>
>>>>> Here's a question I couldn't find the answer to: should any  
>>>>> BioSQL-
>>>>> loaded data (via BioJava, BioPerl, etc) be expected to fully round
>>>>> trip across any BioSQL-utilizing language?  In other words, if  
>>>>> I use
>>>>> BioJava/Hibernate to load sequence data in to a BioSQL database  
>>>>> and
>>>>> use BioPerl to work with the data, can one expect it to work?
>>>>>
>>>>> My guess is no, as long as there is no formal specification...
>>>>>
>>>>> chris
>>>>>
>>>>> On Sep 11, 2007, at 9:54 AM, Richard Holland wrote:
>>>>>
>>>>>> -----BEGIN PGP SIGNED MESSAGE-----
>>>>>> Hash: SHA1
>>>>>>
>>>>>> There is no formal specification for what goes where in  
>>>>>> BioSQL, but
>>>>>> you
>>>>>> can refer to the BioJava documentation for a good  
>>>>>> approximation of
>>>>>> where
>>>>>> a GenBank file should end up. The BioJava objects share similar
>>>>>> names to
>>>>>> the BioSQL tables and are mapped using Hibernate.
>>>>>>
>>>>>> The most useful parts of the docs are probably:
>>>>>>
>>>>>> http://biojava.org/wiki/BioJava:BioJavaXDocs#GenBank
>>>>>>
>>>>>> and:
>>>>>>
>>>>>> http://biojava.org/wiki/BioJava:BioJavaXDocs#Hibernate_object-
>>>>>> relational_mappings.
>>>>>>
>>>>>> cheers,
>>>>>> Richard
>>>>>>
>>>>>> Paul Davis wrote:
>>>>>>> I've been going over the biosql schema and I was wondering if
>>>>>>> there
>>>>>>> was a good place to read about examples of actual data that goes
>>>>>>> into
>>>>>>> each table. Specifically, I'm a bit confused about which parts
>>>>>>> of a
>>>>>>> genbank record go in which tables.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Paul Davis
>>>>>>> _______________________________________________
>>>>>>> BioSQL-l mailing list
>>>>>>> BioSQL-l at lists.open-bio.org
>>>>>>> http://lists.open-bio.org/mailman/listinfo/biosql-l
>>>>>>>
>>>>>> -----BEGIN PGP SIGNATURE-----
>>>>>> Version: GnuPG v1.4.2.2 (GNU/Linux)
>>>>>> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
>>>>>>
>>>>>> iD8DBQFG5qw64C5LeMEKA/QRAiAPAJ41b3+cO7LQc1F4nAFrUWsVLwbl8wCgjFvd
>>>>>> Q8i8g2bUyB17L++fuSKXa+0=
>>>>>> =q8G2
>>>>>> -----END PGP SIGNATURE-----
>>>>>> _______________________________________________
>>>>>> BioSQL-l mailing list
>>>>>> BioSQL-l at lists.open-bio.org
>>>>>> http://lists.open-bio.org/mailman/listinfo/biosql-l
>>>>>
>>>>> Christopher Fields
>>>>> Postdoctoral Researcher
>>>>> Lab of Dr. Robert Switzer
>>>>> Dept of Biochemistry
>>>>> University of Illinois Urbana-Champaign
>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> BioSQL-l mailing list
>>>>> BioSQL-l at lists.open-bio.org
>>>>> http://lists.open-bio.org/mailman/listinfo/biosql-l
>>>>
>>>
>>> Christopher Fields
>>> Postdoctoral Researcher
>>> Lab of Dr. Robert Switzer
>>> Dept of Biochemistry
>>> University of Illinois Urbana-Champaign
>>>
>>>
>>>
>>> _______________________________________________
>>> BioSQL-l mailing list
>>> BioSQL-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/biosql-l
>>
>> --
>> ===========================================================
>> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
>> ===========================================================
>>
>>
>>
>>
>>
>> _______________________________________________
>> BioSQL-l mailing list
>> BioSQL-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/biosql-l
>>

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================