[Biopython-dev] GSoC python variant update 5

Tue Jun 19 16:23:01 UTC 2012

Lenna,

One concern I had, which may be avoided by your schema of using
narrow-tables, is how well your current structure can support the
inevitable updates to the VCF format.  It may show my inexperience with
SQL, but is a SQL backend flexible enough to adopt new conventions while
also maintaining backwards compatibility?

Also, from a usage standpoint -- I wouldn't want to have a vcf file and a
database file on my drive.  It would be redundant for me.  It may just be
my style, but I usually sieve out the useful information out of a vcf file
into several smaller specific vcf files.  Really what a vcf parser does is
make your output more concise.  I wouldn't want then another .db file for
each time I wanted to parse my vcf file into a smaller chunk.

Additionally, any time you gained in filtering by using a SQL backend may
be negligible when the user gets to this stage.  The file sizes will be
substantially smaller.  In short, I think you might be over-engineering
this.  Keeping a SQL backend is going to require indexing after updates
(how long will this take, and is the time comprable to using pure python?,
you also have the issue where SQL decides to ignore your index...), and
writing queries that may be optimal for some usage cases and poor in others.

You may have thought about these concerns, and I don't mean to deter your
efforts, you may be a SQL guru for all I know (I also just may be biased
from how I operate).

Chris

On Tue, Jun 19, 2012 at 8:52 AM, Reece Hart <reece at harts.net> wrote:

> On Tue, Jun 19, 2012 at 5:51 AM, Reece Hart <reece at harts.net> wrote:
>
> > (GFF and HGVS have been discussed
>
>
> Ooops. I meant GVF, but the point is the same.
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev
>