[BioSQL-l] TAXON,TAXON_NAME, was Re: Description

Thu Sep 13 00:24:08 UTC 2007

I was more wondering if there was an efficient way to recompute that
information. As you seem to be confirming, I was faily certain that to
update those values would require recalculating all values.

Paul

On 9/12/07, Hilmar Lapp <hlapp at gmx.net> wrote:
> The code is in bioperl-db (which is a sub-repository of bioperl, as
> is bioperl-live).
>
> It makes no attempt at updating the nested-set values. That raises a
> good point - there is currently no script that would update that; the
> load_ncbi_taxonomy.pl script does recompute it, but will also want to
> load or update the NCBI taxonomy. It should be relatively easy to
> factor out the nested-set computing code into a separate stand-alone
> script.
>
>         -hilmar
>
> On Sep 12, 2007, at 8:13 PM, Paul Davis wrote:
>
> > I glanced through the bioperl cvs a bit but couldn't find the part
> > where it tries to load a new taxonomy name. Does this go and try to
> > rebuild the nested sets information, or basically leave any inserted
> > taxonomic data (non-NCBI data) as nodes dangling outside the nested
> > sets information?
> >
> > Paul
> >
> > On 9/12/07, Hilmar Lapp <hlapp at gmx.net> wrote:
> >> The species/taxon handling shouldn't be a problem if you have the
> >> NCBI taxonID and have preloaded the NCBI taxonomy.
> >>
> >> However, if it's a new species (i.e., the lookup of the NCBI taxonID
> >> in the taxon table fails), then bioperl-db tries to create the
> >> lineage based on what it finds in the species object.
> >>
> >> As the bug report says, the issue can be fixed, but it also looks
> >> like the fix will break compatibility with earlier versions of
> >> BioPerl. I think at some point that's fine, but I was wondering
> >> whether that's the way it needs to be.
> >>
> >>         -hilmar
> >>
> >> On Sep 11, 2007, at 12:16 PM, Chris Fields wrote:
> >>
> >>> I think one area of possible headache will be TAXON/TAXON_NAME.  For
> >>> instance, with BioPerl we kept running into genus/species parsing
> >>> problems (virus, bacterial names) when going from seqrecord->object.
> >>> Due to that we decided to greatly simplify Species parsing in
> >>> Bioperl
> >>> so there isn't any 'guessing' as to genus/species names; you get
> >>> what's already there, nothing more.  If one wants extra taxonomic
> >>> information then one must use NCBI Taxonomy somehow.
> >>>
> >>> However, currently bioperl-db still splits into genus/species (acts
> >>> like older BioPerl), which obviously clashes with current Bioperl
> >>> behavior.  Not sure how the other Bio* store this data; Richard?
> >>>
> >>> There is a BioPerl bug filed on this:
> >>>
> >>> http://bugzilla.open-bio.org/show_bug.cgi?id=2092
> >>>
> >>> chris
> >>>
> >>> On Sep 11, 2007, at 10:49 AM, Barry Moore wrote:
> >>>
> >>>> Well, the schema is the formal specification as to what goes where
> >>>> and as long as your BioJava and BioPerl DB interface plays by the
> >>>> rules of the schema, then yes you should be able to use both
> >>>> languages on the same database.  Of course the devil is in the
> >>>> details and since I've only worked with the BioPerl interface I
> >>>> don't know if that is in fact reality right now.  I think what
> >>>> Richard meant was there is not detailed human documentation about
> >>>> where each bit of a GenBank record goes into what table and
> >>>> column.  Paul, I think you will find this document to be what you
> >>>> are looking for - or at least as good as you'll get:  go to http://
> >>>> cvs.open-bio.org/cgi-bin/viewcvs/viewcvs.cgi/biosql-schema/doc/?
> >>>> cvsroot=biosql and look for schema-overview.txt.  There is also a
> >>>> ERD in pdf format which can help you get your head around the
> >>>> schema.  If you end up with specific questions about what's where,
> >>>> send another e-mail or just load some files and go exploring.
> >>>>
> >>>> Barry
> >>>>
> >>>> On Sep 11, 2007, at 9:10 AM, Chris Fields wrote:
> >>>>
> >>>>> Here's a question I couldn't find the answer to: should any
> >>>>> BioSQL-
> >>>>> loaded data (via BioJava, BioPerl, etc) be expected to fully round
> >>>>> trip across any BioSQL-utilizing language?  In other words, if
> >>>>> I use
> >>>>> BioJava/Hibernate to load sequence data in to a BioSQL database
> >>>>> and
> >>>>> use BioPerl to work with the data, can one expect it to work?
> >>>>>
> >>>>> My guess is no, as long as there is no formal specification...
> >>>>>
> >>>>> chris
> >>>>>
> >>>>> On Sep 11, 2007, at 9:54 AM, Richard Holland wrote:
> >>>>>
> >>>>>> -----BEGIN PGP SIGNED MESSAGE-----
> >>>>>> Hash: SHA1
> >>>>>>
> >>>>>> There is no formal specification for what goes where in
> >>>>>> BioSQL, but
> >>>>>> you
> >>>>>> can refer to the BioJava documentation for a good
> >>>>>> approximation of
> >>>>>> where
> >>>>>> a GenBank file should end up. The BioJava objects share similar
> >>>>>> names to
> >>>>>> the BioSQL tables and are mapped using Hibernate.
> >>>>>>
> >>>>>> The most useful parts of the docs are probably:
> >>>>>>
> >>>>>> http://biojava.org/wiki/BioJava:BioJavaXDocs#GenBank
> >>>>>>
> >>>>>> and:
> >>>>>>
> >>>>>> http://biojava.org/wiki/BioJava:BioJavaXDocs#Hibernate_object-
> >>>>>> relational_mappings.
> >>>>>>
> >>>>>> cheers,
> >>>>>> Richard
> >>>>>>
> >>>>>> Paul Davis wrote:
> >>>>>>> I've been going over the biosql schema and I was wondering if
> >>>>>>> there
> >>>>>>> was a good place to read about examples of actual data that goes
> >>>>>>> into
> >>>>>>> each table. Specifically, I'm a bit confused about which parts
> >>>>>>> of a
> >>>>>>> genbank record go in which tables.
> >>>>>>>
> >>>>>>> Thanks,
> >>>>>>> Paul Davis
> >>>>>>> _______________________________________________
> >>>>>>> BioSQL-l mailing list
> >>>>>>> BioSQL-l at lists.open-bio.org
> >>>>>>> http://lists.open-bio.org/mailman/listinfo/biosql-l
> >>>>>>>
> >>>>>> -----BEGIN PGP SIGNATURE-----
> >>>>>> Version: GnuPG v1.4.2.2 (GNU/Linux)
> >>>>>> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
> >>>>>>
> >>>>>> iD8DBQFG5qw64C5LeMEKA/QRAiAPAJ41b3+cO7LQc1F4nAFrUWsVLwbl8wCgjFvd
> >>>>>> Q8i8g2bUyB17L++fuSKXa+0=
> >>>>>> =q8G2
> >>>>>> -----END PGP SIGNATURE-----
> >>>>>> _______________________________________________
> >>>>>> BioSQL-l mailing list
> >>>>>> BioSQL-l at lists.open-bio.org
> >>>>>> http://lists.open-bio.org/mailman/listinfo/biosql-l
> >>>>>
> >>>>> Christopher Fields
> >>>>> Postdoctoral Researcher
> >>>>> Lab of Dr. Robert Switzer
> >>>>> Dept of Biochemistry
> >>>>> University of Illinois Urbana-Champaign
> >>>>>
> >>>>>
> >>>>>
> >>>>> _______________________________________________
> >>>>> BioSQL-l mailing list
> >>>>> BioSQL-l at lists.open-bio.org
> >>>>> http://lists.open-bio.org/mailman/listinfo/biosql-l
> >>>>
> >>>
> >>> Christopher Fields
> >>> Postdoctoral Researcher
> >>> Lab of Dr. Robert Switzer
> >>> Dept of Biochemistry
> >>> University of Illinois Urbana-Champaign
> >>>
> >>>
> >>>
> >>> _______________________________________________
> >>> BioSQL-l mailing list
> >>> BioSQL-l at lists.open-bio.org
> >>> http://lists.open-bio.org/mailman/listinfo/biosql-l
> >>
> >> --
> >> ===========================================================
> >> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
> >> ===========================================================
> >>
> >>
> >>
> >>
> >>
> >> _______________________________________________
> >> BioSQL-l mailing list
> >> BioSQL-l at lists.open-bio.org
> >> http://lists.open-bio.org/mailman/listinfo/biosql-l
> >>
>
> --
> ===========================================================
> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
> ===========================================================
>
>
>
>
>
>